I
I–spin — Ixodides
I-spin A quantum-mechanical variable or quantum number applied to quarks and their compounds, the strongly interacting fundamental hadrons, and the compounds of those hadrons (such as nuclear states) to facilitate consideration of the consequences of the charge independence of the strong (nuclear) forces. This variable is also known as isotopic spin, isobaric spin, and isospin. The many strongly interacting particles (hadrons) and the compounds of these particles, such as nuclei, are observed to form sets or multiplets such that the members of the multiplet differ in their electric charge and magnetic moments, and other electromagnetic properties but are otherwise almost identical. For example, the neutron and proton, with electric charges that are zero and plus one (in units of the magnitude of the electron charge), form a set of two such states. The pions, one with a unit of positive charge, one with zero charge, and one with a unit of negative charge, form a set of three. It appears that if the effects of electromagnetic forces and the closely related weak nuclear forces (responsible for beta decay) are neglected, leaving only the strong forces effective, the different members of such a multiplet are equivalent and cannot be distinguished in their strong interactions. The strong interactions are thus independent of the different electric charges held by different members of the set; they are charge-independent. See ELEMENTARY PARTICLE; FUNDAMENTAL INTERACTIONS; HADRON; STRONG NUCLEAR INTERACTIONS. The I-spin (I) of such a set or multiplet of equivalent states is defined such that Eq. (1) is satisfied, N = 2I + 1
(1)
where N is the number of states in the set. Another quantum number I3, called the third component of I-spin, is used to differentiate the numbers of a multi-
plet where the values of I3 vary from +I to −I in units of one. The charge Q of a state and the value of I3 for this state are connected by the Gell-Mann-Okubo relation, Eq. (2), where Y, the charge offset, is called Q = I3 +
Y 2
(2)
hypercharge. For nuclear states, Y is simply the number of nucleons. Electric charge is conserved in all interactions; Y is observed to be conserved by the strong forces so that I3 is conserved in the course of interactions mediated by the strong forces. See HYPERCHARGE. Similarity to spin. This description of a multiplet of states with I-spin is similar to the quantummechanical description of a particle with a total angular momentum or spin of j (in units of , Planck’s constant divided by 2π). Such a particle can be considered as a set of states which differ in their orientation or component of spin jz in a z direction of quantization. There are 2j + 1 such states, where jz varies from −j to +j in steps of one unit. To the extent that the local universe is isotropic (or there are no external forces on the states that depend upon direction), the components of angular momentum in any direction are conserved, and states with different values of jz are dynamically equivalent. There is then a logical or mathematical equivalence between the descriptions of (1) a multiplet of states of definite I and different values of I3 with respect to charge-independent forces and (2) a multiplet of states of a particle with a definite spin j and different values of jz with respect to directionindependent forces. In each case, the members of the multiplet with different values of the conserved quantity I3 on the one hand and jz on the other are dynamically equivalent; that is, they are indistinguishable by any application of the forces in question. See ANGULAR MOMENTUM; SPIN (QUANTUM MECHANICS).
2
I-spin Relative intensities of virtual transitions determined by isobaric spin symmetry Transition
Relative intensity
p → n + π+ p → p + π0 n → n + π0 n → p + π−
2/3 1/3 1/3 2/3
Importance in reactions and decays. The charge independence of the strong interactions has important consequences, defining the intensity ratios of different charge states produced in those particle reactions and decays which are mediated by the strong interactions. A simple illustration is provided by the virtual transitions of a nucleon to a nucleon plus a pion, transitions which are of dominant importance in any consideration of nuclear forces. The neutron and proton form a nucleon I-spin doublet with I = 1 /2 ; the pions form an I-spin triplet with I = 1. If initially there are one neutron and one proton, with no bias in charge state or I3, and the strong forces responsible for the virtual transitions do not discriminate between states with different charge or different values of I3, then it follows that in the final system there must be equal probabilities of finding each charge member of the pion triplet and equal probabilities of finding a neutron or proton. This condition, that the strong interactions cannot differentiate among the members of an isobaric spin multiplet, determines the relative intensities of the transitions (see table). Using the same kind of argument, it is easy to see that the conditions of equal intensity of each member of a multiplet cannot be fulfilled in a transition from an initial doublet to a final state of a doublet and a quartet. Therefore, none of the individual transitions is allowed by charge independence, though charge or I3 is conserved in the decays. In general, decays are allowed for a transition A → B + C only if inequality (3) is satisfied. This is analogous to the |I(B) + I(C)| ≥ |I(A)| ≥ |I(B) − I(C)|
(3)
vector addition rule for spin or angular momentum; the strong interactions conserve I-spin in the same manner as angular momentum is conserved. See SELECTION RULES (PHYSICS). Classification of states. I-spin considerations provide insight into the total energies or masses of nuclear and particle states. The fundamental constituents of nuclei are the nucleons, the neutron and proton–spin-1/2 fermions which must obey the Pauli exclusion principle to the effect that the sign of the wave function that describes a set of identical fermions must change sign upon exchange of any two fermions. Similarly, hadrons are described as compounds of quarks, which are also spin-1/2 fermions. The two fermions that make up an I-spin doublet can be considered as different charge states of a basic fermion, even as states with the spin in the plus and minus direction of quantization are consid-
ered as different spin states of the fermion. The extended Pauli exclusion principle then requires that the wave function amplitude change sign upon exchange of spin direction, charge, and spatial coordinates for two (otherwise) identical fermions. See EXCLUSION PRINCIPLE. A space state u(r) of two identical fermions, where r is the vector distance between the two particles, will be even upon exchange of the two particles if u(r) has an even number of nodes, and will be odd under exchange if there is an odd number of nodes. With more nodes, the space wavelength is smaller, and the momentum and energy of the particles are larger. The lowest energy state must then have no spatial nodes and must be even under spatial interchange. From the Pauli principle, the whole wave function must be odd, and then the exchange under spin and I−spin coordinates must be odd. Using this kind of argument, the low-mass (low-energy) states of light nuclei can be classified in terms of their I-spin symmetries. An application of the same principle, that the space wave function of the lowest state must be even under exchange, was an important element in the discovery of a new quantum number (labeled color) for quarks, the elementary constituent of the hadrons, and concomitantly the unfolding of a deeper understanding of fundamental particles. Basis for charge independence. The basis for the symmetry described by I-spin is to be found in the quark structure of the strongly interacting particles and the character of the interactions between quarks. All of the strongly interacting particles are quark compounds; the conserved fermions, such as the neutron and proton, are made up of three quarks; the bosons, such as the π mesons, are quark-antiquark pairs. There are six significantly different quarks arranged in three pairs of charge +2/3 and charge −1/3 particles, (u2/3, d−1/3), (c2/3, s−1/3), and (t2/3, b−1/3), called up and down, charm and strange, and, top and bottom. The quarks interact through their strong interacting color charges that couple the quarks to gluons in a manner analogous to the coupling of electrical charge to photons. Rather than one kind of electrical charge, and its negative, quarks have three kinds of color charges (conventionally labeled r for red, y for yellow, and b for blue), and their negatives. Even as there are three different colors, there are 3 × 3 − 1 = 8 different gluons labeled with color and anticolor (with the so-called white combination ruled out), rather than the one photon that carries the electromagnetic forces. Since each kind of quark carries exactly the same sets of color charge, the strong forces between two quarks are exactly the same as the forces between two other quarks. However, the simple consequences of this color independence of the strong forces are largely obviated by the fact that the six quarks have different masses. Hence, the + hyperon, (uus), is appreciably heavier than the proton, (uud), even as the s-quark is heavier than the d-quark, though the quark-quark forces holding the systems together are the same.
Ice cream However, the masses of the two lightest quarks, the u and d, are almost the same, differing by only a few electron masses. Then, for the many situations that this small difference can be neglected, the u and d quarks cannot be differentiated by the strong forces; that is, the neutron (duu) and proton (ddu) are identical under the strong forces, which is just the symmetry described by I-spin. There are some effects where the mass difference, equal to about two pion masses, between the s and u quark, or the s and d quark, can be neglected, or for which corrections can be calculated. For such effects twofold symmetries like I−spin, called, respectively, V-spin and U-spin, are useful. In principle, similar broken symmetries exist for compounds of the heavier quarks, but the symmetry breaking that follows from their much larger mass differences so obscures the underlying symmetry that it is not useful. See COLOR (QUANTUM MECHANICS); GLUONS; QUANTUM CHROMODYNAMICS; QUARKS; SYMMETRY LAWS Robert K. Adair (PHYSICS). Bibliography. F. Halzen and A. Martin, Quarks and Leptons, 1984; N. A. Jelley, Fundamentals of Nuclear Physics, 1990; G. Kane, Modern Elementary Particle Physics, updated ed., 1993; L. B. Okun, Leptons and Quarks, 1980, paper 1985; S. S. Wong, Introductory Nuclear Physics, 2d ed., 1999.
Ice cream A commercial dairy food made by freezing while stirring a pasteurized mix of suitable ingredients. The product may include milk fat, nonfat milk solids, or milk-derived ingredients; other ingredients may include corn syrup, water, flavoring, egg products, stabilizers, emulsifiers, and other non-milk-derived ingredients. Air incorporated during the freezing process is also an important component. The structure of ice cream is complex. It consists of solid, gaseous, and liquid phases; ice crystals and air cells are dispersed throughout the liquid phase, which also contains fat globules, milk proteins, and other materials. Composition. In the United States, ice cream composition is regulated by Federal Frozen Dessert Standards of Identity, which set forth minimum composition requirements of not less than 10% milk fat and not less than 20% total milk solids. Ice cream must weigh not less than 4.5 lb/gal (0.54 kg/liter) and must contain not less than 1.6 lb food solids/gal (0.26 kg/liter). In the case of bulky flavors, the fat may be not less than 8%, nor can the total milk solids be less than 16% in the finished food. Ingredient and nutritional-requirements labeling is included in the standards. The composition of ice cream may vary depending on whether it is an economy brand satisfying minimal requirements, a trade brand of average composition, or a premium brand of superior composition. The components by weight of an average-composition ice cream are 12% fat, 11% nonfat milk solids, 15% sugar, and 0.3% vegetable gum stabilizer.
An average serving of 4 fl oz (120 ml) of vanilla ice cream with 10% milk fat provides about 135 calories, 0.4 oz (11.2 g) fat, 0.8 oz (22.4 g) protein, 0.56 oz (15.9 g) carbohydrate, 0.003 oz (88 mg) calcium, 0.0024 oz (67 mg) phosphorus, 270 International Units vitamin A, and 0.0000058 oz (0.164 mg) riboflavin. Thus, a serving of ice cream provides more than 10% of the U.S. Recommended Daily Dietary Allowance (USRDA) for riboflavin and calcium and about 5% of the USRDA for protein for adults and teenagers. See NUTRITION. French ice cream may contain a relatively high fat content, have a slight yellow color, and contain egg yolk solids. Both this product and ice cream carry the same requirements. French ice cream must contain not less than 1.4% egg yolk solids for plain flavor and not less than 1.12% for bulky flavors. It is usually sold as a soft-serve product but may be sold as a prepackaged hardened product. Gelato is Italian for ice cream; it commonly increases in volume by only one-third on freezing. The composition of ice milk is similar to that of ice cream except that it must contain more than 2% but not more than 7% milk fat, not less than 11% total milk solids, and not less than 1.3 lb food solids/gal (0.15 kg/liter). Sherbet is made with about 1 part ice cream mix to 4 parts water ice mix and is manufactured in the same way as ice cream. It must weigh not less than 6 lb/gal (0.7 kg/liter). Sherbet contains not less than 1% or more than 2% fat, and total milk solids of not less than 2% or more than 5% by weight of the finished food; it must also contain 0.35% edible citric or natural fruit acid. A sherbet made with addition of egg yolk is known as frappe. Frozen dairy desserts. Products classed as frozen dairy desserts include French ice cream, frozen custard, ice milk, sherbet, water ice, frozen dairy confection, dietary frozen dairy dessert, and mellorine (imitation ice cream). Most of these desserts can be sold in the soft-serve form, as they are when first removed from the freezer at about 20◦F (−7◦C), or in the hard form as they are when the temperatures are reduced to about 8◦F (−13◦C). A frozen dessert mix is the combination of ingredients, usually in the liquid form and ready to be frozen. Dry mixes must be dissolved in water before freezing. Unflavored mixes may be flavored before freezing, as with vanilla and chocolate, or may have flavorings added to the soft frozen product as it exits the freezer, as with fruit, nut, and candy type desserts. Water ice is made from water, sugar, corn sweeteners, fruit juices, flavorings, and stabilizers. It contains no milk solids. Sorbet is similar; in addition it contains finely divided fruit and is whipped while being frozen. There is no federal standard for sorbet in the United States. Dietary frozen desserts are comparatively low in calories. Ice milk commonly contains about onethird the fat of ice cream. Proposed U.S. Federal Standards permit 2–5% fat in ice milk and establish a category of light (lite) ice cream with 5–7% fat. To provide frozen desserts for diabetics, sugar (sucrose)
3
4
Ice cream is replaced with nonnutritive sweeteners, and complex carbohydrates, such as sorbitol, are added to provide desirable texture. Mellorine is manufactured in a process similar to that for ice cream, but is not limited to milk-derived nonfat solids and may be composed of animal or vegetable fat or both, only part of which may be milk fat. It contains not less than 6% fat and not less than 2.7% protein by weight. Frozen dairy confections are produced in the form of individual servings and are commonly referred to as novelties, including bars, stick items, and sandwiches. These constitute an important segment of the industry. Soft frozen dairy products make up about 9.6% of the total frozen desserts produced in the United States. Per capita consumption is 5 qt (4.7 liters) per year. The soft frozen product is usually drawn from the freezer at 18 to 20◦F (−8 to −7◦C) and is served directly to the consumer. Ice milk accounts for about three-fourths of the soft-serve products that are manufactured in the United States. These products usually vary from 2 to 7% fat, 11 to 14% nonfat milk solids, 13 to 18% sugar, and have about 0.4% stabilizer. Frozen custard and ice cream are also important soft-serve products. Commercial manufacture. In ice cream manufacture the basic ingredients are blended together (see illus.). The process ranges from small-batch operations, in which the ingredients are weighed or measured by hand, to large automated operations, where the ingredients, all in liquid form, are metered into the mix-making equipment. The liquid materials, including milk, cream, concentrated milk, liquid
sugar syrup, and water, are mixed. The dry solids, such as nonfat dry milk, dried egg yolk, stabilizer, and emulsifier, are blended with the liquid ingredients. This liquid blend is known as the mix. Following the blending operation, the mix is pasteurized, homogenized, cooled, and aged. Pasteurization destroys all harmful microorganisms and improves the storage properties of the ice cream. See PASTEURIZATION. The hot mix is pumped from the pasteurizer through a homogenizer, which reduces the fat globules to less than 2 micrometers in diameter. Homogenization is accomplished by forcing the mix through the homogenization valves under pressure in the range 1500–3000 lb/in.2 (103–207 kilopascals). This phase of the manufacturing process results in a uniform, smooth product and limits churning of the fat during the freezing process. The mix is cooled immediately to 32–40◦F (0–4◦C); then it is aged 4–12 h in order to permit development of favorable hydration and particle orientation effects, which serve to improve the body and texture in the finished ice cream. Soluble flavoring materials are usually added to the mix just before the freezing process, but fruits, nuts, and candies are not added until the ice cream is discharged from the freezer. In the United States, popular flavors of ice cream include vanilla, chocolate, strawberry, butter pecan, and “cookies ‘n cream” (broken cookies are blended into the ice cream), and there are many other flavors. Neapolitan is one-third each of vanilla, chocolate, and strawberry extruded simultaneously into one container. Ice cream is frozen in batch or continuous
cartons and wrappers dry receiving
flavors, fruits, and nuts ingredients condensed blend cream
liquid receiving
cane suger corn suger water
fruit and nuts mix storage 0° to 4°C
blending pasteurizing 68°C for 30 min 79°C for 25 s
flavoring
cooling 0° to4° C
homogenizing 71°C
pump
pump
packaging and wrapping –4°C
freezing –6°C
pump pump
◦
◦
Flow chart for ice cream processing. F = ( C × 1.8) + 32. (Arbuckle and Co.)
hardening –43° to –46°C
storage –23° to –29°C
Ice field freezers. The continuous freezing process is more effective, as it freezes more rapidly and produces a product with finer texture. Continuous freezers vary in capacity from 85 gal (323 liters) to more than 2000 gal (7600 liters) per hour. During freezing, air is incorporated into the mix, resulting in increased volume. This increase in volume is called overrun. The drawing temperature of the ice cream from the freezer is about 21◦F (−6◦C). The ice cream is packaged at this temperature and cooled further to about −20◦F (−29◦C) in a hardening room, where it is stored until marketed. The ice cream industry of the United States has been described as a high-volume, highly automated, progressive, very competitive industry composed of large and small factories. It has developed on the basis of abundant and economical supply of ingredients. Also contributing to the growth of the industry are factors such as achievement of high quality; favorable consumer response; excellent transportation, refrigeration, and distribution; advanced packaging; and abundant home refrigeration. The annual per capita production in the United States is about 15 qt (14 liters) of ice cream. Combined production of ice cream, ice milk, sherbet, and mellorine amounts to about 22 qt (21 liters) per person. This amounts to 1.32 billion gal (5 billion liters) per year of ice cream and related products. Wendell S. Arbuckle; Robert T. Marshall Sources of microorganisms. The cream, condensed- and dry-milk products, egg products, sugar, stabilizers, flavoring and coloring materials, and fruits and nuts used as ingredients in ice cream may be contaminated with microorganisms. Therefore the ice cream mix must be pasteurized before freezing because a low temperature in itself is not sufficient to kill microorganisms. Other possible sources of microorganisms are equipment and utensils used in manufacture, scoops and dippers, vendors, employees, and air. Stringent regulations must be established for all manipulations of ice cream from producer to consumer in order to avoid recontamination, especially with regard to pathogens. Once incorporated, microorganisms survive a long time in ice cream. Kinds of microorganisms. A wide variety of microorganisms may be found, including streptococci, micrococci, coliform bacteria, spore-formers, yeasts, and molds. No selection of species by freezing takes place. Since no growth and only slow destruction of microorganisms in the final product occur, the flora of ice cream consists of organisms surviving pasteurization or incorporated by contamination. With well-controlled sanitation during production, plate counts may be as low as a few hundred; with careless methods they may be in the millions. Normally a slight increase of the bacterial count takes place during manufacture, especially after homogenization and freezing, due to the breaking up of clumps of bacteria. A plate count not greater than 50,000– 100,000 per gram is regarded as acceptable from a sanitary point of view in most states.
Hygienic measures. These include proper pasteurization of the mix and adequate sanitizing of equipment. Pasteurization must be more intensive with the mix than with milk because increased milk solids and sugar protect microorganisms to some degree. At least 155◦F (68.4◦C) for 30 min or 175◦F (79.5◦C) for 25 s is applied to obtain a sufficient margin of safety. In this way all pathogens are killed and total bacterial counts of 10,000 or less per gram are usually reached. Higher counts after pasteurization are due to thermoduric organisms. The source of these organisms is inadequately cleaned farm or plant equipment. Sanitizing of equipment is achieved by cleaning and disinfecting operations with heat or chemicals. See DAIRY MACHINERY; STERILIZATION. Bacteriological control. For determining the hygienic quality of ice cream, the laboratory tests used are similar to those employed for milk. Of these, the standard plate count and the test for coliform bacteria have found widespread application. See FOOD MANUFACTURING; MILK. William C. Winder Bibliography. W. S. Arbuckle, Ice Cream, 5th ed., 1996; International Ice Cream Association, The Latest Scoop: Facts and Figures on Ice Cream and Related Products, annually; R. K. Robinson (ed.), Modern Dairy Technology, 2d ed., 1993.
Ice field A network of interconnected glaciers or ice streams, with common source area or areas, in contrast to ice sheets and ice caps. The German word Eisstromnetz, which translates literally to ice-stream net, is sometimes used for glacial systems of moderate size (such as less than 3000 mi2 or 8000 km2) and is most applicable to mountainous regions. Being generally associated with terrane of substantial relief, ice-field glaciers are mostly of the broad-basin, cirque, and mountain-valley type. Thus, different sections of an ice field are often separated by linear ranges, bedrock ridges, and nunataks. Contrast with ice sheet. An ice sheet is a broad, cakelike glacial mass with a relatively flat surface and gentle relief. Ice sheets are not confined or controlled by valley topography and usually cover broad topographic features such as a continental plateau (for example, much of the Antarctic ice sheet), or a lowland polar archipelago (such as the Greenland ice sheet). Although ice sheets are generally of very large dimension, in some regions small, rather flat ice bodies have been called ice sheets because they are thinned remnants of once large masses of this form. Small ice sheets and even ice fields are sometimes incorrectly referred to as ice caps, even though their configurations have been well characterized. Contrast with ice cap. Ice caps are properly defined as domelike glacial masses, usually at high elevation. They may, for example, make up the central nourishment area of an ice field at the crest of a mountain range, or they may exist in isolated positions as
5
6
Ice manufacture separate glacial units in themselves. The latter type is characterized by a distinctly convex summit dome, bordered by contiguous glacial slopes with relatively regular margins not dissected by outlet valleys or abutment ridges. Similarities and gradations. There are all gradations between ice caps, ice fields, and ice sheets. Over a period of time, a morphogenetic gradational sequence may also develop in any one region. Major ice sheets, for example, probably originate from the thickening and expansion of ice fields and the coalescence of bordering piedmont glaciers. Conversely, ice fields can develop through the thinning and retraction of a large ice sheet overlying mountainous terrane. See GLACIATED TERRAIN; GLACIOLOGY. Maynard M. Miller
Ice manufacture Commercial production of manufactured artificial ice from water, or of dry ice from the solidification of carbon dioxide, by mechanical refrigeration. Of great economic importance was the manufacture of water ice for use in refrigeration units, fishing boats, fish- and meat-packing plants, and dairies. However, there was a sharp increase in the trend toward cubing, sizing, and packaging of ice. The domestic ice market practically vanished with development of the automatic household refrigerator. See DRY ICE. Most ice is made in galvanized cans that are partially immersed in brine in an ice-making tank (Fig. 1). Brine made of sodium chloride or calcium chloride is used. The brine is cooled by ammonia as the refrigerant evaporates in pipe coils or brine cool-
ers submerged in the ice tank. The ice cans are filled with raw water, or treated water if it initially contains large amounts of impurities. The water is usually precooled. Cold brine circulates around the ice cans to freeze the water. Commercial ice is frozen in 300- or 400-lb (135- or 180-kg) blocks for which the freezing time with brine at 12◦F (−11◦C) is 38–42 h. Freezing time depends largely on the thickness of the ice cake and the brine temperature. In large plants, cans are arranged in group frames for harvesting as many as 34 cans at a time. A traveling crane picks up a frozen group, transports and drops it into a dip tank for thawing, then moves it to a can dump where the ice cakes slide out and into a storage room. The empty cans are refilled with fresh water and are returned to the ice tank by the crane. If clear ice is required, the water in each can must be agitated with air during freezing; otherwise opaque ice is formed. Because the water must be cooled to 32◦F (0◦C) before freezing can start and because of system losses, about 1.6 tons (1.5 metric tons) of refrigeration is required to make 1 ton (0.9 metric ton) of ice when the raw water enters at 70◦F (21◦C). The manufacture of ice in slush, flake, or cube form by continuous ice-makers (Fig. 2) constitutes a large portion of the commercial market. Also, small, fully automatic, self-contained, cube and flake icemakers are used in bars, restaurants, hotels, and hospitals. Ice is also frozen artificially in a flat horizontal sheet for ice-skating. Such skating rinks may be indoor or outdoor, permanent or portable. The rink floor is covered with pipe coils through which cold brine is circulated and over which water is sprayed
travelling crane can dump two 300-lb ice cans air header
can filler
core sucking and refilling needles
ammonia surge drum core sucking pump
air bowler
water inlet tubes for air agitation
brine agitator
ammonia coils in raceway
Fig. 1. Typical can ice plant. 300 lb = 135 kg. (Worthington Pump and Machinery Corp.)
Iceberg ice sheet peels off at change water level
roller chute to storage bin
steel shell insulation thick stiff ice film
center pipe rotation nozzles
thin flexible ice film cylinder
cylinder drive
brine drum
ice sheet growing in thickness Fig. 2. Commercial flake ice machine.
until a frozen sheet 1/2 –3/4 in. (13–19 mm) thick is obtained. The brine is cooled in coolers by a refrigerant such as ammonia or Freon-12. See REFRIGERATION. Carl F. Kayan
Ice point The temperature at which liquid and solid water are in equilibrium under atmospheric pressure. The ice point is by far the most important “fixed point” for defining temperature scales and for calibrating thermometers. It is 273.15 K (0◦C or 32◦F). A closely related point is the triple point, where liquid, solid, and gaseous water are in equilibrium. It is 0.01 K (0.018◦F) higher on the Kelvin scale than the ice
vapor
liquid water
thermometer well
liquid
ice ice and liquid bath
Arrangement for determining triple point.
point. The triple point has gained favor as the primary standard since it can be attained with great accuracy in a simple closed vessel, isolated from the atmosphere. Readings are reproducible to about 0.0001 K (0.0002◦F), but dissolved gases or other foreign matter may raise the error to 0.001 K (0.002◦F) or more. See TEMPERATURE; TRIPLE POINT. The triple-point apparatus shown in the illustration consists of a thermometer well that is filled with liquid water and jacketed by a cavity containing the three phases of water under a pressure of about 0.006 atm (0.09 lb/in.2 or 600 pascals). The ice, initially deposited by prechilling the well, melts during heat transfer from the thermometer. See PHASE EQUILIBRIUM. Ralph A. Burton Bibliography. J. F. Schooley (ed.), Temperature: Its Measurement and Control in Science and Industry, 1982; K. Wark, Thermodynamics, 6th ed., 2000.
Iceberg A large mass of glacial ice broken off and drifted from parent glaciers or ice shelves along polar seas. Icebergs are distinguished from polar pack ice, which is sea ice, and from frozen seawater, whose rafted or hummocked fragments may resemble small icebergs. See GLACIOLOGY; SEA ICE. Characteristics and types. The continental or island icecaps of both the Arctic and Antarctic regions produce icebergs where the icecaps extend to the sea as glaciers or ice shelves. The “calving” of a large iceberg is one of nature’s great spectacles, considering that a Greenland iceberg may weigh over 5 million tons and that Antarctic icebergs are many times larger. An iceberg consists of glacial ice, which is compressed snow, having a variable specific gravity that averages about 0.89 gram per cubic centimeter. An iceberg’s above-water volume is about one-eighth to one-seventh of the entire mass. However, spires and peaks of an eroded or weathered iceberg results in height-to-depth ratios of between 1:6 and 1:3. Tritium age experiments on melted ice from Greenland icebergs indicate that they may be around 50,000 years old. Minute air bubbles entrapped in glacial ice impart a snow-white color to it and cause it to effervesce when immersed. See SEAWATER; TRITIUM. Icebergs are classified by shape and size. Terms describing icebergs include arched, blocky, domed, pinnacled, tabular, valley, and weathered, as well as bergy-bit and growler for iceberg fragments that are smaller than cottage size above water. An iceberg may last indefinitely in cold polar waters, eroding only slightly during the summer months. But an iceberg that drifts into warmer water disintegrates in a matter of weeks in sea temperatures between 40 and 50◦F (4 and 10◦C) or in days in sea temperatures over 50◦F. A notable feature of icebergs is their long and distant drift, under the influence of ocean currents, into shipping lanes where they become navigation hazards. The normal extent of iceberg drift is shown in Fig. 1.
7
8
Iceberg
Fig. 1. Normal extent of iceberg drift, indicated by darker color tint.
Arctic icebergs. In the Arctic, icebergs originate chiefly from glaciers along Greenland’s coasts. It is estimated that a total of about 16,000 icebergs are calved annually in the Northern Hemisphere, of which over 90% are of Greenland origin. But only about half of these have a size or source location to enable them to achieve any significant drift. The majority of the latter stem from some 20 glaciers along the west coast of Greenland between the 65th and 80th parallels of latitude. The most productive glacier is the Jacobshavn at latitude 68◦N, calving about 1400 icebergs yearly, and the largest is the Humboldt Glacier at latitude 79◦ with a seaward front extending 65 mi (104 km). The remainder of the Arctic icebergs come from East Greenland and the icecaps of Ellesmere Island, Iceland, Spitsbergen, and Novaya Zemlya, with almost no sizable icebergs produced along the Eurasian or Alaskan Arctic coasts. No icebergs are discharged or drift into the North Pacific Ocean or its adjacent seas, except for a few small ones that calve each year from the piedmont glaciers along the Gulf of Alaska. These achieve no significant drift. See ARCTIC OCEAN; ATLANTIC OCEAN. The Arctic ocean currents and adjacent seas determine the drift and ultimately, the distribution of icebergs—wind having only short-term effects. Icebergs along the East Greenland coast drift predominantly southward around the tip of Greenland (Cape Farewell) and then northward along its west coast. Here they join the West Greenland icebergs and travel in a counterclockwise gyre across Davis Strait and Baffin Bay. The icebergs are then carried southward under the influence of the Labrador Current along the coasts of Baffin Island, Labrador, and Newfoundland. This drift terminates along the Grand Banks of Newfoundland, where the waters of the
Labrador Current mix with the warm northeastward flowing Gulf Stream. Here even the largest icebergs melt within 2–3 weeks. See GULF STREAM. Freak iceberg drifts, outside the limits shown in Fig. 1, have been reported in the mid-Atlantic and off Scotland, Nova Scotia, and the Azores. In 1926 the southernmost known iceberg (a growler) reached 30–20◦N, 62–32◦W (about 150 mi from Bermuda). In 1912 a growler was sighted 75 mi east of Chesapeake Bay, U.S.A. Such sightings, however, are extremely rare. About 500 icebergs a year are carried past Newfoundland (latitude 48◦N) and into the North Atlantic, usually between February and June. These are survivors of an estimated 3-year journey from West Greenland. Most icebergs become stranded along the Arctic coasts and are ultimately destroyed by wave action and summer melting. The number arriving at the Grand Banks each year varies from near zero (in 1924, 1940, 1941, 1958) to 1500 or more (in 1972, 1984, 1991, 1994). Icebergs in the Northern Hemisphere may reach proportions as large as 2000 ft in width and 400 ft in height above the water (Fig. 2). Such icebergs weigh upward of 15 million tons. More commonly, icebergs are 200–500 ft in width and 200 ft in height. The highest iceberg ever sighted (1959, in Melville Bay, Greenland) was 550 ft above the water. Shipping hazards, however, come not so much from large icebergs as from smaller ones, which can be difficult to detect by radar. Ice islands. An icecap extending seaward on a broad front may form a landfast floating ice shelf rather than as an iceberg-producing glacier. Such ice shelves are common in the Antarctic but relatively rare in the Arctic. Ice broken from the shelf drifts freely as large tabular masses, ranging in size
Iceberg
Fig. 2. Arctic iceberg, eroded to form a valley or drydock type; grotesque shapes are common to the glacially produced icebergs of the North. Note the brash and small floes of sea ice surrounding the berg.
from a few square miles to several hundred square miles (Fig. 3). The term “ice island” is limited to the Arctic. Similar masses in the Antarctic are termed icebergs. Ice islands originate from the ice shelves of the northern shores of the Canadian archipelago and Greenland, and usually circulate in the Arctic
Ocean for an indefinite time. The principal source of large ice islands is the Ward Hunt Ice Shelf on Northern Ellesmere Island. Aerial reconnaissance since 1946 has observed about 100 ice islands. They occasionally drift into the Norwegian Sea and Baffin Bay, where they erode and become icebergs. During
Fig. 3. Antarctic iceberg, tabular type. Such bergs develop from great ice shelves along Antarctica and may reach over 100 mi (160 km) in length. The U.S. Coast Guard icebreaker Westwind is in the foreground.
9
10
Icebreaker World War II, Winston Churchill proposed establishing airfields on ice islands in the Norwegian Sea. This was not done because of the faster rate of melting in those waters. Scientific camps have been set up on ice islands mostly by United States and Russian research agencies. International Ice Patrol. The sinking of the passenger liner Titanic in 1912 with a loss of over 1500 lives resulted in the formation of the International Conferences on the Safety of Life at Sea. This included the establishment of the International Ice Patrol. The Patrol, operated by the U.S. Coast Guard, maintains an ice observation and reporting service in the North Atlantic Ocean. It is supported by 20 nations and has operated each year (except wartime) since 1913. First by ships, then search aircraft, the Patrol now uses satellite imagery and airborne radar as observation tools. It also has carried out oceanographic surveys, Greenland glacier mapping, and iceberg destruction experiments. Other ice reporting services are maintained by Canada, Denmark, Iceland, Finland, Japan, and Russia for their respective waters, and the U.S. National Ice Center for Polar Regions. Antarctic icebergs. In the Southern Ocean, icebergs originate from the giant ice shelves along the Antarctic continent and from glaciers which may flow into the shelves or form floating ice tongues. Masses breaking from either can result in huge, tabular icebergs 200 ft high and several hundred square miles in area. The world’s largest glacier and iceberg producer is the Lambert Glacier (70◦E longitude), which merges with the Amery Ice Shelf. In 1963 a giant iceberg measuring 50 by 70 mi was calved. This iceberg was tracked until 1976, by which time it had broken into thousands of smaller icebergs. The largest iceberg ever reported was calved in 1992 from the Thwaites Ice Tongue (110◦W longitude). It was larger than the state of Rhode Island, and was still being tracked in 2000 with a size of about 25 by 50 mi. Antarctic icebergs drift throughout the Southern Ocean under the influence of the Antarctic Circumpolar Current, and into the South Atlantic, Indian, and Pacific oceans, where they have been sighted as far north as latitude 35◦S (Fig. 1). When weathered, Antarctic icebergs attain a deep bluish hue of great beauty rarely seen in the Arctic. Water volume. The total volume of ice calved each year as icebergs is estimated to equal about half the volume of the world’s total water usage. The feasibility of towing or otherwise moving large icebergs to water-deficient lands has been proposed and studied. But the difficulties of towing large icebergs over 6 million tons (as determined by actual tests), and the rapid deterioration of icebergs in warm waters appear to make such a venture impractible at this time. Robertson P. Dinsmore Bibliography. N. Bowditch, Ice in the sea, The American Practical Navigator, chap. 34, U.S. Defence Mapping Agency Publ., no. 9, 1995; T. F. Budinger, Iceberg detection by radar, Proceedings of the International Maritime Consultative Or-
ganization Conference on Safety of Life at Sea, London, 1960; R. P. Dinsmore, Ice and Its Drift into the North Atlantic Ocean, International Commission for Northwest Atlantic Fisheries (ICNAF) Spec. Publ., no. 8, 1972; J. Sater, Arctic Drifting Stations, Report on Activities Supported by the Office of Naval Research, 1968; E. H. Smith, The Marion Expedition to Davis Strait and Baffin Bay, 1928-29-30-31, 3 vols., U.S. Coast Guard Bull., no. 19., 1932; U.S. Coast Guard, Annual Reports of the International Ice Patrol, USCG Publ., no. 188, 1919 onward; U.S. Defense Mapping Agency, Sailing Directions for Antarctica, DMA Publ., no. 200, 1992; World Meteorological Organization, Sea Ice Nomenclature, WMO Publ., no. 259, 1971.
Icebreaker A ship designed to break floating ice. Icebreaker technology has evolved rapidly since the 1960s as a result of potential resource development in the Arctic regions of Russia, Canada, and the United States. This led to the construction of icebreaking ships that can transit to all areas of the world, including the North Pole. The Arctic Ocean and Antarctica play a large role in shaping the global climate. With the Arctic Ocean and the ice-covered waters of Antarctica being the least studied of all the oceans, icebreakers provide the platforms from which polar science and research can be conducted on a yearround basis. On August 22, 1994, the first U. S. and Canadian icebreakers, USCGC Polar Sea and CCGS Louis S. St. Laurent, arrived at the North Pole after having conducted scientific research in areas never previously studied. See ANTARCTIC OCEAN; ARCTIC OCEAN. Types. Icebreakers may be classed as polar or subpolar, depending on their primary geographic area of operation. Polar icebreakers have a capability for independent operation in first-year, second-year, and multiyear ice in the Arctic or the Antarctic. Subpolar icebreakers are capable of icebreaker operations in the ice-covered waters of coastal seas and lakes outside the polar regions. See MARITIME METEOROLOGY; SEA ICE. Resistance. The speed that an icebreaker can achieve in ice depends upon the resistance of the hull to break the ice and submerge the broken ice pieces. The resistance increases approximately as the square of the level ice thickness. As an example, the resistance in breaking 4 ft (1 m) of ice is basically four times that for breaking 2 ft (0.5 m) of ice. Resistance increases almost linearly with an increase in the ship’s beam. Other factors that influence resistance are the amount of friction developed between the hull surface and the ice, and, to a lesser degree, the length of the icebreaker. Hull form. The combined shapes of the bow, midbody, and stern constitute the hull form. Each component has unique requirements to make it efficient in icebreaking and ice transiting. The icebreaker’s bow must be designed to break ice efficiently, and
Icebreaker
COAST GUARD
11
Outboard profile of the U.S. Coast Guard icebreaker Polar Sea.
it consists of an inclined wedge that forms an angle of about 20◦ with the ice surface (see illus.). As the icebreaker advances, the bow rides up on the edge of the ice until the weight of the ship becomes sufficiently large that the downward force causes the ice to fail. As the icebreaker advances, the broken ice pieces move under the hull and the process is repeated. The sides of the icebreaker can be vertical or sloped. Traditionally, icebreakers have sloped sides of 5–20◦ to reduce the frictional resistance. Sloped sides can also improve the turning ability (maneuverability) of the ship in ice. The stern of the icebreaker is designed in a manner similar to the bow so that the ship can break ice while going astern. It must also be designed for the proper flow of water into the propellers in both the ahead and astern directions. Because of the many factors that influence icebreaking performance, the design involves considerable compromise and judgment, particularly when icebreakers may spend 50% of their time in open water. See SHIP DESIGN. Hull structure. The hull of a polar icebreaker is built very strong so that it can withstand tremendous ice forces. Many icebreakers have to withstand impacts with ice that is 20 ft (6 m) thick. Because these frequent and high loads occur at very low air temperatures of −50◦F (−45◦C), specialty steels are used on polar icebreakers. It is common to have bow and stern plating of 1.5–2.0 in. (4–5 cm) in thickness. The side and bottom plating thickness are somewhat less. Subpolar icebreakers, with only a limited ice transiting capability, have only a portion of the hull strengthened. It is called an ice belt and runs several feet above and below the waterline of the hull. Propulsion. Icebreaker propulsion systems consist of engines, transmission systems, and propellers. Most icebreakers are powered with diesel engines. However, when the horsepower required for icebreaking is sufficiently great, gas turbines or nuclear steam turbines are needed. Transmission systems are either mechanical (primarily gears) or electrical (generators and motors). Propellers on an icebreaker can number from one to four, and the propellers can be either fixed pitch or controllable pitch. A single pro-
peller is frequently used on those ships requiring only limited icebreaking capability. Most icebreakers, however, have either two or three propellers for purposes of system reliability and improved maneuverability. All propulsion systems must be capable of withstanding the shock and high torque loads that are transmitted to the system when the propeller blades impact or mill ice pieces. See MARINE ENGINE; MARINE MACHINERY; PROPELLER (MARINE CRAFT); SHIP POWERING, MANEUVERING, AND SEAKEEPING. Reduction of icebreaking resistance. Many auxiliary methods have been developed to improve the performance of icebreakers by reducing the friction between the hull and the ice. One widely used method is to roll or heel the ship from side to side. The rolling motion is achieved by pumping large quantities of water from side to side in specially configured heeling tanks. In addition to heeling systems, most icebreakers use a low-friction coating or a stainless-steel ice belt to reduce icebreaking resistance. Another concept is the use of air bubbler systems. In these systems, icebreaking resistance is reduced by discharging large volumes of air underwater at the bow and sides of the icebreaker. The rising air bubbles carry large quantities of water to the surface and reduce the friction at the hull-ice interface. Another method to reduce friction is the water wash system, involving pumping large quantities of water through nozzles located in the hull onto the ice surface at the bow and forward sides of the icebreaker. Tests in ice model basins. The ability to evaluate the icebreaking performance of new designs before construction is very important. A scaled model of the ship is made and tested in a towing basin inside a large refrigerated space. Ice is grown on the water surface, and the model is towed or self-propelled through the ice to determine the resistance. See TOWING TANK. Richard P. Voelker Bibliography. L. W. Brigham (ed.), The Soviet Maritime Arctic, 1991; R. A. Dick and J. E. Laframboise, An empirical review of the design and performance of icebreakers, Mar. Technol., 26(2):145– 159, 1989; E. V. Lewis (ed.), Principles of Naval
11
12
Ichneumon Architecture, 2d ed., 3 vols., Society of Naval Architects and Marine Engineers, 1988; R. P. Voelker et al., Eight years of performance measurements aboard Polar class icebreakers, Soc. Nav. Arch. Mar. Eng. Trans., 100:271–292, 1992; H. Yamaguchi, Experimental Voyage through Northern Sea Route, INSROP Symposium Tokyo ’95, 1996.
Ichneumon The common name for medium- to large-sized insects (wasps) comprising 25,000 species and belonging to the family Ichneumonidae (Hymenoptera); together with the Braconidae (with an additional 15,000 species) they form the superfamily Ichneumonoidea. The majority of ichneumons are slender wasps with long filiform antennae, freely rotating head, and permanantly extruded ovipositor. The ichneumons, found worldwide, are most abundant in heavy vegetational areas such as the edges of woodlands, hedgerows, and in secondary growth vegetation, and are often seen in the woodland canopy or in the litter of the forest floor. Ichneumons are more commonly seen in the spring and fall, as these wasps avoid hot, dry weather. All members of the family Ichneumonidae are parasitic during their immature stages. They attack a wide variety of insects, usually the larval stages of beetles, butterflies, moths, flies, and other Hymenoptera. Since many of their insect hosts are considered pests of forest and agricultural crops, ichneumons are beneficial and important in biological control. See ENTOMOLOGY, ECONOMIC. Female ichneumons use chemical and physical cues emanating from the host (the insect on or within which the ichneumon’s progeny will develop) to locate it for egg laying. The length of the ovipositor is variable, being relatively short in species attacking exposed hosts, and extremely long in species attacking hosts enclosed in galls or stem cavities, or boring in wood. Some species, such as in the genus Megarhyssa (see illus.), may have ovipositors longer than the body.
Long-tailed ichneumon (Megarhyssa lunator), about 6 in. (15 cm), of which more than 4 in. (10 cm) is “tail.”
The female ichneumon deposits an egg on or within the host. Most species are solitary; that is, only one egg is laid. A few species are gregarious as larvae, with the female laying several eggs in the host which all develop into adults. Species that lay their eggs on the host are referred to as ectoparasites; those which deposit their eggs within the host, endoparasites. The eggs hatch into white, fleshy, grublike larvae which slowly consume the host. The ichneumon larvae feed through the cuticle if they are ectoparasites or feed internally on tissues if they are endoparasites. The mature larvae usually kill the host as they emerge and subsequently pupate in a cocoon. The females of some species of ectoparasitic ichneumons inject a venom that paralyzes the host prior to laying an egg. Some endoparasites have been found to inject a symbiotic virus along with an egg into the host. The virus appears to alter the physiology of the host for the parasite’s benefit. Most ichneumons are host-specific; that is, they only attack and develop in a particular species of host. However, a few species are polyphagous, and attack a number of different hosts. The ability to attack and destroy a wide variety of insects makes ichneumons particularly important to humans in suppressing pest insects. See HYMENOPTERA; INSECTA. S. B. Vinson Bibliography. R. P. Askew, Parasitic Insects, 1971.
Ichthyopterygia A supraorder of extinct marine reptiles from the Mesozoic, or the Age of Dinosaurs, belonging to the subclass Diapsida. The group consists of the order Ichthyosauria and several basal forms from the Early Triassic. See DIAPSIDA; DINOSAUR; REPTILIA. Fossil record. Ichthyopterygians existed for about 160 million years. Their oldest definitive record comes from the latest part of the Early Triassic (latest Olenekian), whereas the youngest record is from the strata formed during the earliest part of the Late Cretaceous (Cenomanian). Ichthyosaurs became extinct about 25 million years before dinosaurs disappeared at the end of the Cretaceous. The timing approximately corresponds to an extinction event of marine organisms at the Cenomanian/Turonian boundary. See EXTINCTION (BIOLOGY). Body characteristics. Ichthyopterygians are noted for their tuna-shaped (tunniform) body outline in combination with the presence of a well-defined tail fin or caudal fluke. Cetaceans (whales and dolphins) were the only other tetrapods (vertebrates with four limbs) to evolve this body form. Early ichthyosaurs were elongate and could be loosely described as “lizards with flippers.” These early forms probably propelled themselves using body undulation, as in living cat sharks. Various body outlines evolved during the Middle Triassic, but only a single lineage, which gave rise to a demarcated caudal fluke, survived the Late Triassic. A tuna-shaped body outline also became established by mid Late Triassic
Ichthyostega time (middle Norian), and this group diversified into a large number of species during the Jurassic and Cretaceous. Body length varied from approximately 0.5 m (2 ft) to over 20 m (67 ft) in ichthyosaurs. Body size of the first ichthyosaurs (Early Triassic) never exceeded 3 m (10 ft), but giants reaching 15 m (50 ft) or more appeared during the Late Triassic (Shonisaurus), and some fragments suggest the possibility of a second whale-sized form in the Early Jurassic. The largest species from the Jurassic and Cretaceous attained maximum lengths of 10 m (33 ft). Eyes. Many Jurassic ichthyosaurs possessed disproportionately large eyeballs, including the largest eyes that have ever been measured in vertebrates, which reached over 26 cm (10 in.) in diameter in Temnodontosaurus, whose body length was about 9 m (30 ft). Ophthalmosaurus had eyes measuring 23 cm (9 in.) and had a body length of less than 4 m (13 ft). See EYE (VERTEBRATE). Limbs. Early ichthyosaurs had five digits (fingers) in their hands and feet (that is, their front and back flippers), as in most other tetrapods. However, some later species had as many as 10 digits per hand, whereas others had as few as three. These ichthyosaurs also had disk-shaped phalanges and metacarpals (finger and palm bones), unlike most other vertebrates with limbs. Diet. Ichthyopterygian diet included fish, cephalopods, shells, and some other vertebrates such as sea turtles and birds. Possibilities also existed for feeding upon other marine reptiles. R. Motani Bibliography. C. McGowan and R. Motani, Ichthyopterygia, Handbuch der Palaerpetologie Part 8, Verlag Dr. Friedrich Pfeil, Munich, 2003; C. McGowan, Dinosaurs, Spitfires, and Sea Dragons, Harvard University Press, Cambridge, MA, 1991; R. Motani, Ichthyosauria: Evolution and physical constraints of fish-shaped reptiles, Annu. Rev. Earth Planet. Sci., 33:12.1–12.26, 2005; R. Motani, Rulers of the Jurassic seas, Sci. Amer., 283(6):52–59, December 2000.
Ichthyornithiformes A group of extinct flying birds known exclusively from the latest stages of the Cretaceous (80– 65 million years ago) in North America. Along with Hesperornithiformes, Ichthyornithiforms are some of the earliest nonmodern bird taxa known to science. Indeed, alongside Archaeopteryx (from much earlier Jurassic rocks in Germany, about 140 million years old), the ichthyornithiformes Ichthyornis, Apatornis, and their kin formed the basis for knowledge of avian evolution for almost a century. Although the fossil record of these birds has improved in recent years, Ichthyornithiformes are still poorly understood. See AVES; HESPERORNITHIFORMES; ODONTOGNATHAE. The discovery of Ichthyornis in 1875 caused excitement—here was a flying bird that possessed sharp pointed teeth. Charles Darwin wrote about
Ichthyornithiformes (and their relatives, Hesperornithiformes) in 1880 and noted that they appeared to provide excellent evidence in support of his ideas on the origin of species. Most recently, these archaic avians were placed within the clade Ornithurae, which also contains Hesperornithiformes and all modern birds (Neornithes). See NEORNITHES. Much controversy has surrounded these birds: only recently have they been well described anatomically, and at least one familiar skeleton appears to comprise several individuals. Although all original finds of these birds were from the Cretaceous marine rocks of Kansas, more recent discoveries have extended their known range across the Northern Hemisphere, into Europe and Asia. Ichthyornithiformes appear to have been important and diverse constituents of late Cretaceous ecosystems, dying out along with most lineages of dinosaurs at the endCretaceous mass extinction event. See EXTINCTION (BIOLOGY). Gareth Dyke Bibliography. S. Chatterjee, The Rise of Birds, Johns Hopkins University Press, Baltimore, 1997; L. M. Chiappe and L. M. Witmer, Mesozoic Birds: Above the Heads of Dinosaurs, University of California Press, Berkeley, 2002; A. Feduccia, The Origin and Evolution of Birds, 2d ed., Yale University Press, New Haven, 1999; S. L. Olson, The Fossil Record of Birds, in Avian Biology, vol. 8, Academic Press, New York, 1985.
Ichthyostega Four-legged vertebrates (basal tetrapods) that evolved from their lobe-finned fish ancestors during the later Devonian Period (400–350 million years ago) [see illus.]. Ichthyostega was the first Devonian tetrapod to be described and was found in East Greenland during the 1930s. For many decades it remained the only representative of the “fish-tetrapod transition” which was known from articulated skeletal fossils. Subsequently, a second genus from the same beds, Acanthostega, was more fully described, and during the 1990s Devonian tetrapod genera were recognized in other parts of the world. Like other very early tetrapods, Ichthyostega had a skull composed of an outer casing of bone (the skull roof) with a separate box containing the brain (the braincase) inside it. The skull roof bore the teeth, and the teeth of the upper jaw were substantially larger but fewer in number than those of the lower jaw. This suggests some specialization of diet, but it is not clear for what purpose. Ichthyostega was undoubtedly a predator. The braincase shows some highly unusual features, especially in the ear region, which are difficult to interpret and do not resemble those of any other early tetrapod. The postcranial skeleton shows some primitive and some very specialized features. The tail bore a fringe of fin-rays like those of a fish; these are lost in all other known early tetrapods except Acanthostega. The vertebrae consisted of several parts. The centra were formed of hoops of bone surrounding the
13
14
Igneous rocks
Reconstruction of the skeleton of Ichthyostega based on recent information. The animal was about 1 m (3 ft) long.
flexible supporting notochord. The notochord was the main point of attachment between the vertebral column and the head, inserting into the rear of the braincase. The neural arches supporting the spinal cord articulated with one another via well-developed joint surfaces (zygapophyses). The ribs were massive overlapping blades which formed a corset around the body and gave it rigidity. The animal would not have been able to bend very much. The massive shoulder girdle bore the large complex humerus which articulated with the shoulder at right angles to the body. The elbow was also more or less permanently fixed in a right angle, allowing the forearm to be used as a prop but permitting very little other movement. The hand is still unknown. The large pelvic girdle was attached to the vertebral column only by ligaments and muscles—no bony joint surfaces were involved as they are in modern tetrapods. The femur appears to have been about half the length of the humerus, and the hindlimb was paddlelike with tibia, fibula, and ankle bones all broad and flattened. The foot bore seven toes, with three tiny ones at the leading edge and four stout ones behind. It seems likely that the toes were enclosed in a web of skin. Because of its peculiarities, Ichthyostega has not been helpful in working out many ideas about the origin of tetrapods. Now however, it can be seen in the context of other Devonian tetrapods, and its specializations will help provide more information about how, where, and when early tetrapods first radiated onto the land. See AMPHIBIA. Jennifer A. Clack Bibliography. J. A. Clack, Devonian tetrapod trackways and trackmakers: A review of the fossils and footprints, Palaeogeog. Palaeoclimatol. Palaeoecol., 130:227–250, 1997; E. Jarvik, The Devonian tetrapod Ichthyostega, Fossils and Strata, 40:1– 213, 1996.
Igneous rocks Rocks that are formed from melted rock, usually by cooling. The melted rock is called magma when it is below the surface, and lava when it flows out onto the surface. When the magma or lava cools or moves into areas of lower pressure, crystals or glass or both form. Thus, igneous rocks are composed of crystals, glass, or combinations of these materials. Magmas sometimes erupt explosively, creating ash that is composed of broken crystals, glass, and rock
materials called pyroclastic material. Rocks formed at or very near the surface, including pyroclastic rocks, are called volcanic rocks, whereas those formed from magma at depth are called plutonic rocks. In the past, some rocks that formed below the surface but near it were called hypabyssal rocks. See LAVA; MAGMA. Like all rocks, igneous rocks are classified on the basis of their composition and texture. Both mineral and chemical compositions are used for classification, but the rocks are first subdivided by texture into three categories. (1) Volcanic rocks are dominated by interlocking grains too small to see with the unaided eye or a simple lens. (2) Plutonic rocks are composed almost entirely of interlocking grains big enough to see. (3) Pyroclastic rocks are composed of fragments (clasts) of glass, minerals, and preexisting rocks. See PETROLOGY. Composition. The range of igneous rock chemistries is relatively large. The common igneous rocks consist of silica (SiO2), 40 to 77%; alumina (Al2O3), 31 24 28 >24 24
20°
20° 40°
15
20°
>29
22
20
40°
20° 0°
>28
27
26 25
February
>28
28 28
120° 140° E
100°
27
120°
140° August
>28
25 26 22
20° 0°
27 20°
20 15
40°
40°
10 5
20°
40°
60°
80°
100°
120° 140° E
◦
Fig. 4. Sea-surface temperature ( C) of Indian Ocean.
Surface temperature. The pattern of sea-surface temperatures changes considerably with the seasons (Fig. 4). During February, when the Intertropical Convergence is near 10◦S, the heat equator is also in the Southern Hemisphere and most of the area between the Equator and 20◦S has temperatures near 82◦F (28◦C). The water in the northern parts of the Bay of Bengal and of the Arabian Sea is much cooler, and temperatures below 68◦F (20◦C) can be found in the northern portions of the Persian Gulf and the Red Sea. In the Southern Hemisphere temperatures decrease gradually from the tropics to the polar regions. Surface circulation affects the distribution of temperature, and warm water spreads south along the coast of Africa and cool water north off the west coast of Australia, causing the isotherms to be inclined from west to east. During August high temperatures are found in the Northern Hemisphere and in the equatorial region. The Somali Current advects cool water along the coast of Africa to the north. Simultaneously the Southwest Monsoon causes strong upwelling of cooler subsurface water, lowering average water temperature in August to less than 72◦F (22◦C) along Somalia, and to less than 75◦F (24◦C) along Arabia. In the Southern Hemisphere isotherms are almost 10◦ of latitude farther north. Surface salinity. The distribution of surface salinity is controlled by the difference between evaporation and precipitation and by runoff from the continents (Fig. 5). High surface salinities are found in the subtropical belt of the Southern Hemisphere, where evapora-
tion exceeds rainfall. In contrast, the Antarctic waters are of low salinity because of heavy rainfall and melting ice. Another area of low salinities stretches from the Indonesian waters along 10◦S to Madagascar. It is caused by the heavy rainfall in the tropics. The Bay of Bengal has very low salinities, as a result of the runoff from large rivers. In contrast, because of high evaporation the Arabian Sea has extremely high salinities. High salinities are also found in the Persian Gulf and in the Red Sea, representing the arid character of the landmasses surrounding them. The salinity distribution changes relatively little during the year; however, south of India from the Bay of Bengal to the west, a flow of low-salinity water, caused by the North Equatorial Current, can be noticed during February. Surface water masses. The different climatic conditions over various parts of the Indian Ocean cause the formation of characteristic surface water masses. The Arabian Sea water is of high salinity, has a moderate seasonal temperature variation, and can be classified as subtropical. The water in the Bay of Bengal is of low salinity and always warm, representing tropical surface water. Another type of tropical surface water stretches from the Indonesian waters to the west and is called the Equatorial Surface Water. Subtropical Surface Water has a seasonal temperature variation of between 59 and 77◦F (15 and 25◦C) and is found in the subtropical regions of the Southern Hemisphere. Its southern boundary is the Subtropical Convergence coinciding with temperatures of about 59◦F (15◦C). From there, temperature and salinity decrease in the area of the transition water to
20° 20° 0° 20°
40°
60°
80°
100°
>37 39 38 31 36.5 36 32 37 33 36 36
30
34.5
35
20° 0°
0, there exists a number b > 0 such that when 0 < x < b, then y > a, and when −b < x < 0, then y < −a. This example indicates that it is sometimes useful to distinguish +∞ and −∞. The points +∞ and −∞ are pictured at the two ends of the y axis, a line which has no ends in the proper sense of euclidean geometry. Infinity in geometry. In geometry of two or more dimensions, it is sometimes said that two parallel lines meet at infinity. This leads to the conception of just one point at infinity on each set of parallel lines and of a line at infinity on each set of parallel planes. With such agreements, parts of euclidean geometry can be discussed in the terms of projective geometry. For example, one may speak of the asymptotes of a hyperbola as being tangent to the hyperbola at infinity. Note that the points at infinity which are adjoined to a euclidean line or plane are chosen in a manner dictated by convenience for the theory being discussed. Thus only one point at infinity is adjoined to the complex plane used for geometric representation in connection with the theory of functions of a complex variable. Other concepts. Other types of infinities may be distinguished when properties other than the mere cardinal number of a set are being considered. For
Inflammation example, a line and a plane contain the same number of points, but when continuity considerations are important the line is said to be made up of a single infinity of points, whereas the plane has a double infinity, or ∞2, points. This is because there exists a one-to-one continuous correspondence between the line and a subset of the plane, but there does not exist such a correspondence between the whole plane and the line or any part of it. As another example, an infinite set may be ordered in different ways so as to have different ordinal numbers. See CALCULUS. Lawrence M. Graves Bibliography. R. G. Bartle and D. R. Sherbert, Introduction to Real Analysis, 3d ed., 1999; W. Rudin, Real and Complex Analysis, 3d ed., 1991.
Inflammation The local response to injury, involving small blood vessels, the cells circulating within these vessels, and nearby connective tissue. The early phases of the inflammatory response are stereotyped: A similar sequence of events occurs in a variety of tissue sites in response to a diversity of injuries. The response characteristically begins with hyperemia, edema, and adherence of the circulating white blood cells to endothelial cells. The white cells then migrate between the endothelial cells of the blood vessel into the tissue. The subsequent development of the inflammatory process is determined by factors such as type and location of injury, immune state of the host, and the use of therapeutic agents. Cardinal signs. The discomfort of inflammation may be attendant on a sunburn, a mosquito bite, a cut, an abscess, or a vaccination, and so it is not surprising that historical records of inflammation date to the earliest known medical writings, in the Egyptian papyri. It remained, however, for Celsus (about 30 B.C.–A.D. 38), a Roman, to enumerate the four cardinal signs of inflammation: redness, swelling, heat, and pain. Redness is caused by hyperemia, that is, an increased number of red blood cells in the local capillary bed. Heat results from increased flow of blood in the small blood vessels, and the edema, or swelling, represents the accumulation of extracellular fluid in the connective tissue. Stimulation of nerve endings by agents released during the inflammatory process causes pain. See EDEMA. While it is possible to observe the events of inflammation in human skin, a detailed study of the dynamic cellular events requires a more convenient site in an experimental animal. Tissues such as rat mesentery, frog tongue, hamster cheek pouch, or bat wing have the virtues of accessibility, vascularity, and transparency necessary for microscopic study of the inflammatory process. Vascular changes. Following a mild injury, there is a fleeting constriction of the smallest arterial branches (arterioles) in the viable tissue close to the injury, lasting from 5 s to about 5 min, followed by dilation of the same arterioles. That leads to engorge-
ment and dilation of the capillaries, a change that extends over an area larger than the constriction site. Constriction of the smallest veins (venules) may also contribute to capillary engorgement. Along with those blood vessel changes, the steadystate flow of water back and forth across the vessel walls is disrupted; when the outflow of water from the blood vessel exceeds the return, extravascular extracellular fluid collects. An excess of fluid in the extracellular compartment of the connective tissue is termed edema. The vessel walls also become more permeable to the large protein molecules that circulate in the plasma; those molecules leak into the tissue and induce the outward movement of water. Blood flow is slowed in the immediate vicinity of the injury by constriction of the venules and concentration of the blood cells. The latter phenomenon is a result of loss of water from the circulation. Cessation of blood flow (stasis) may occur and be associated with formation of a thrombus, a blood clot in the small vessels, or may be only transitory with prompt restoration of flow. See CIRCULATION; THROMBOSIS. The leukocytes, at first predominantly neutrophilic granulocytes, adhere to endothelial cells that line the capillaries and venules, and then migrate through the vessel wall into the edematous connective tissue, drawn there by chemotactic agents that are released at the site of the injury. If the tissue injury breaks small blood vessels, another mechanism comes into play. Exposure of circulating blood to collagen leads to clumping of platelets, and the resulting aggregate forms a temporary hemostatic plug that prevents free bleeding from the broken vessel. The clotting mechanism is set in action somewhat more slowly, and the fibrin formed thereby bolsters the plug. See COLLAGEN. Cellular changes. In contrast to the vascular response, the cellular response in inflammation is varied and serves to characterize the different types of inflammation. Participating cells come from two sources, the circulating blood and the local connective tissue. The circulating leukocytes are divided into six distinguishable cell types: granulocytes (eosinophil, neutrophil, and basophil), small and large lymphocytes, and monocytes. Fibroblasts and mast cells are solely connective-tissue cells. Macrophages and giant cells arise in the inflammatory locus from monocytes. See BLOOD. Neutrophils are normally the most abundant leukocytes in the circulation and are the first cells to accumulate in an inflammatory site. They have the capacity to ingest and kill bacteria; the cytoplasmic granules, which identify the neutrophil, contain the enzymes responsible for killing and digesting the bacteria. When the number of circulating neutrophils is greatly decreased, as in patients after treatment with certain drugs or exposure to nuclear radiation, frequent and severe infections may occur. In spite of antibiotic therapy, such patients often succumb to infection. See PHAGOCYTOSIS. Eosinophils are much rarer than neutrophils but are also phagocytic and may be considerably
147
148
Inflammatory bowel disease increased in patients with allergies. The basophil, least abundant of the granulocytes, is similar to the tissue mast cell; both store and release histamine and are thus responsible for hives and the symptoms of hay fever. There is evidence that the beneficial effects of both eosinophils and basophils are affected in dealing with multicellular parasites rather than bacteria or viruses. Lymphocytes are second to neutrophils in abundance and are very important in immune responses, including the rejection of grafts of foreign tissues, and the attack of donor cells. Monocytes, like lymphocytes, lack specific granules. This cell type when stimulated has the potential to change into a macrophage, which, like a neutrophil, can ingest and kill bacteria. Fibroblasts are responsible for synthesis of collagen and other components of the extracellular connective tissue during the healing process. See HISTAMINE. Cause-effect relationship. An inflammatory response may be induced in a variety of ways. Causes of inflammation include trauma, heat, ultraviolet light, x-rays, bacteria, viruses, many chemicals such as turpentine and croton oil, and certain inherently innocuous but foreign substances (antigens) which evoke immune responses. Although many of the components of inflammation are common to a variety of inflammatory responses, particularly in the early stages, there is considerable diversity in fully developed responses; the response to an abscess is very different from that of a burn, for instance. The character of the injury, its severity, and the site of injury modify the progress of the inflammatory response, as does therapeutic intervention. See ANTIGEN; IMMUNITY. A local inflammatory response is usually accompanied by systemic changes: fever, malaise, an increase in circulating leukocytes (leukocytosis), and increases in specific circulating proteins called acutephase reactants. Such signals and symptoms are often helpful to the physician, first as clues to the presence of inflammation and later as an indication of its course. Mediators of inflammation. The process of inflammation, both vascular and cellular, is orchestrated by an array of molecules produced locally. They variously induce the adhesion of circulating leukocytes to the endothelium, increase small blood vessel permeability to proteins, increase blood flow, direct the migration of inflammatory cells, enhance phagocytic activity, incite proliferation and differentiation of inflammatory cell types locally, and elicit the systemic correlates of inflammation. These mediators include histamine, leukotrienes, prostaglandins, complement components, kinins, antibodies, and interleukins. Many anti-inflammatory drugs function by preventing the formation of those mediators or by blocking their actions on the target cells whose behavior is modified by the mediators. Types. Inflammation is frequently described in terms of its time course. Acute inflammation develops rapidly, in a matter of hours to days, and is of relatively short duration. In acute inflammation
the neutrophil is the predominant cell type, and hyperemia and edema are prominent. Chronic inflammation develops over a period of weeks to months and is characterized by an infiltrate of lymphocytes, monocytes, and plasma cells—the chief antibodyproducing cells. Local differentiation and proliferation of macrophages is characteristic of chronic inflammation. Granulomatous inflammation is a specific type of chronic inflammation in which a discrete nodular lesion (granuloma) is formed of macrophages, lymphocytes, plasma cells, and giant cells arranged around a central mass of noncellular material. Granulomas, typical of tuberculosis and fungus infection, also occur in rheumatoid arthritis. See GRANULOMA INGUINALE; TUBERCULOSIS. Abscesses and cellulitis are specific forms of acute inflammation; the former term denotes a localized collection of pus composed of necrotic debris derived from dead tissue cells and neutrophils. Cellulitis is characterized by diffuse hyperemia and edema with an extensive neutrophil infiltrate, and often little tissue destruction. When inflammation causes erosion of an epithelial surface, the lesion is termed an ulcer. See ULCER. Function. Inflammation is basically a protective mechanism. The leakage of water and protein into the injured area brings humoral factors, including antibodies, into the locale and may serve to dilute soluble toxic substances and wash them away. The adherence and migration of leukocytes brings them to the local site to deal with infectious agents. There are also instances in which no causative toxic substance or infectious agent can be found to account for the inflammation. This is the case in rheumatoid arthritis and rheumatic fever. Such diseases may be examples in which an uncontrolled or misdirected inflammatory response with an autoimmune component is turned against the host. See ARTHRITIS; AUTOIMMUNITY; INFECTION; RHEUMATIC FEVER. David Lagunoff Bibliography. I. L. Bonta and M. A. Bray (eds.), The Pharmacology of Inflammation, 1985; J. I. Gallin et al. (eds.), Inflammation: Basic Principles and Clinical Correlates, 3d ed., 1999; G. P. Lewis, Mediator in Inflammation, 1986; J. G. Lombardino (ed.), Nonsteroidal Antiinflammatory Drugs, 1985; P. Venge and A. Lindbom, Inflammation: Basic Mechanisms, Tissue Injuring Principles, and Clinical Models, 1985.
Inflammatory bowel disease Inflammatory bowel disease is a general term for two closely related conditions, ulcerative colitis and regional enteritis or Crohn’s disease. The diseases can affect the colon, distal, small intestine, and sometimes other portions of the gastrointestinal tract as well as several sites outside the gastrointestinal tract. In 15–25% of cases limited to the colon, ulcerative colitis and Crohn’s disease cannot be distinguished by clinical manifestations, x-ray examination,
Inflationary universe cosmology or even pathology. For this reason the broad term inflammatory bowel diseases is useful. The cause of these diseases is unknown. Ulcerative colitis. Ulcerative colitis, an inflammatory condition limited to the colon, primarily affects the mucosa or lining of the colon. Marked inflammation gives rise to small ulcerations and microscopic abscesses that produce bleeding. The condition tends to be chronic, alternating between periods of complete remission and episodes of active and even life-threatening disease. Colitis may involve a varying extent of the colon; when limited to the rectum, it is known as ulcerative proctitis; it may be confined to the descending colon; or it may affect the entire colon, being known in this case as universal colitis. The most common symptoms include rectal bleeding and diarrhea; blood is almost always present during a flare-up. Occasionally, there may be signs of toxicity, with other constitutional symptoms. Manifestations of ulcerative colitis, as well as Crohn’s disease, outside the gastrointestinal tract include eye and joint inflammation and skin disorders. Extraintestinal disease occurs in 15–20% of patients with ulcerative colitis. The greatest concern in long-term management is the risk of colon cancer; ulcerative colitis must therefore be monitored as a precancerous condition. When limited to the rectum and sigmoid, the disease does not seem to increase the risk of colon cancer, but when it is present beyond those most distal segments, the risk begins after about 10 years of disease and progresses steadily. The risk of colon cancer can be evaluated by microscopic examination of the epithelial cells of the colon lining. Biopsies can be examined for cell changes known as dysplasia or precancer. If there is no dysplasia on multiple samples, the risk of cancer remains low; if dysplasia is found, the risk is high. Ulcerative colitis is treated with corticosteroids and a salicylate–sulfa drug combination known as sulfasalazine. If medication is not effective, surgical removal of the colon eliminates the disease but necessitates the creation of an opening in the abdominal wall from which the contents of the intestine can pass. The procedure, known as an ileostomy, redirects the lower end of the small intestine through the surface of the body where a collection appliance is attached to the skin. Alternatives to this method include an internal pouch or a valve mechanism. Other surgical techniques modify the distal rectum or anus so that evacuation can take place through the normal anatomic site. Crohn’s disease. Crohn’s disease, also known as regional enteritis, granulomatous colitis, and terminal ileitis, affects the colon and small intestine, and rarely the stomach or esophagus. Like ulcerative colitis, it is chronic and of unknown etiology. The two diseases have many similarities, including their treatment. Pathologically, however, the findings are usually distinct. In Crohn’s disease, chronic inflammation is present and is usually accompanied by granulomas.
The inflammation involves the full thickness of the intestinal wall, often with bowel narrowing and obstruction of the lumen. Abdominal cramps, alteration of bowel function, and diminished food intake are common. In the young, retardation of growth and of sexual maturation are frequently observed. Crohn’s disease involving different parts of the gastrointestinal tract produces symptoms related to those regions. For example, extensive disease of the distal small intestine can interfere with absorption of nutrients, particularly vitamin B12 and bile salts, which are absorbed somewhat specifically in that region. Deficiency of bile salts can lead to malabsorption of fats. Corticosteroid medications are utilized to treat the condition. Sulfapyridine used in ulcerative colitis can be effective, particularly when Crohn’s disease affects the colon. Metronidazole is effective for treating inflammation in and around the anal region, which is more common in Crohn’s disease than in ulcerative colitis. Surgery may be required in cases of bowel obstruction, hemorrhage, or severe debility. Although Crohn’s disease carries a slight increase in the risk of cancer, the risk is modest when compared to ulcerative colitis. See DIGESTIVE SYSTEM; GASTROINTESTINAL TRACT DISORDERS. Leonard A. Katz Bibliography. A. Anagostides, H. J. Hodgson, and J. B. Kirsner (eds.), Inflammatory Bowel Disease, 1991; S. B. Hanauer and J. B. Kirsner (eds.), Inflammatory Bowel Disease: A Guide for Patients and Their Families, 1985; D. Rachmilewitz (ed.), Inflammatory Bowel Diseases, 1986.
Inflationary universe cosmology A theory of the evolution of the early universe, motivated by considerations from elementary particle physics as well as certain paradoxes of standard big bang cosmology, which asserts that at some early time the observable universe underwent a period of exponential, or otherwise superluminal, expansion. In order to resolve the various paradoxes of the big bang theory, inflationary theories predictthat during this inflationary epoch the scale of the universe increased by at least 28 orders of magnitude. The period of superluminal expansion (inflation) in most cases is caused by the appearance of a nonzero constant energy density in the universe associated with the postulated existence of a phase transition between different ground-state configurations of matter that occurs as the universe expands and cools. After the transition is completed, the constant energy density is converted into the energy density of a gas of relativistic particles. At this point, inflationary scenarios match the standard big bang cosmological model. Origin of inflationary models. The suggestion of an inflationary period during the early universe, in connection with a specific model, was first made in 1980 by A. Guth. (Somewhat earlier a less concrete
149
150
Inflationary universe cosmology but not entirely unrelated possibility was discussed by A. A. Starobinskii.) Guth proposed, based on a consideration of recently proposed grand unification theories in particle physics, that phase transitions—associated with the breaking of certain symmetries of the dynamics governing the interactions of matter—could occur at sufficiently high temperatures and have important consequences in the early universe. Symmetry breaking and phase transitions. The concept of symmetry is of fundamental importance in physics. In particular, a symmetry can be associated with each of the known forces in nature, and each symmetry is, in turn, associated with the existence of certain conserved quantities, such as electric charge in the case of electromagnetism. Surprisingly, a physical system in its ground state may not possess a symmetry that could be fundamental to the basic physics governing its dynamics. The most familiar example is a ferromagnet made up of many individual elementary magnetic dipoles. The equations of electromagnetism are manifestly rotationally invariant: there is no intrinsic north or south. However, the ground state of such a spin system will involve all the spins aligned in one particular direction, yielding the familiar case of a permanent magnet. In this case, it is said that the rotational symmetry of the equations of electromagnetism has been spontaneously broken. The signal for this is that the average value of the total spin of the system points in a certain direction. If the system is heated, so that all the spins become excited and randomly oriented, the net spin of the system approaches zero, its magnetic field vanishes, and rotational symmetry is restored. The total spin of the system is referred to as an order parameter, because its value tells which state the system is in, and a change in the order parameter is an indication of the presence of a phase transition between different ground states of the system. See PHASE TRANSITIONS; SYMMETRY BREAKING; SYMMETRY LAWS (PHYSICS). Phase transitions in particle physics. A situation similar to the case described above occurs in particle physics.
temperature, K
1030
1020
1010
1 10−40
Tc 10−30
10−20
10−10
1
1010
1020
time, s Fig. 1. Temperature versus time after beginning of expansion for the standard big bang model. Before Tc, inflationary scenarios differ from the standard model.
At the temperatures and energies that occur in the universe now, there isavast difference in the nature of the known fundamental forces. For two of these forces—the electromagnetic force, and the so-called weak force, which governs the beta decay processes important to energy production in the Sun—it is now known that the perceived differences are the result of spontaneous symmetry breaking. Above a certain energy or temperature, the weak and electromagnetic interactions appear exactly the same. However, at a critical temperature, the ground state of matter, which in particle physics is called the vacuum state, breaks the symmetry relating the two interactions, and below this temperature they appear quite different. The signal for this symmetry breaking is again the appearance of a nonzero value in a certain order parameter of the system; in this case it is referred to as a vacuum expectation value. In the case of the ferromagnet, the order parameter was a spin, and the ground state after symmetry breaking had a net magnetization. In the case of the particle physics system, the order parameter describes the ground state expectation value of a certain elementary particle field, and the vacuum state after symmetry breaking has a net charge that would have been zero had the symmetry not been broken. In particle physics, if a symmetry is broken at a certain scale of energy, particles that transmit the force of nature associated with that symmetry can have masses characteristic of this energy scale. This is indeed what happens in the case of the weak interactions, which are weak precisely because the particles that transmit the weak force are so massive. However, once the temperature of a system is so great that the energy it takes to produce such a massive particle is readily available in thermal energy, the distinction between this force and electromagnetism, which is transmitted by massless particles (photons), disappears and the two forces become unified into one. Shortly after it was recognized that these two forces could be unified into one, it was proposed that all three observed forces in nature outside of gravity might be combined into one grand unified theory. There are reasons to believe that the scale of energy at which all the forces would begin to appear the same is truly astronomical—some 16 orders of magnitude greater than the mass of the proton. Such energies are not achieved in any terrestrial environment, even in the laboratory. The only time when such a scale of energy was common was in the earliest moments of the big bang fireball explosion. See ELEMENTARY PARTICLE; FUNDAMENTAL INTERACTIONS; GRAND UNIFICATION THEORIES. “Old” inflationary model. Based on these facts, Guth reasoned that if grand unification symmetries are indeed broken at some large energy scale, then a phase transition could occur in the early universe as the temperature cooled below the critical temperature where symmetry breaking occurs. According to the standard big bang model of expansion, the time at which this would occur would be about 10−35 s after the initial expansion had begun (Fig. 1). This
Inflationary universe cosmology time is far earlier than the times previously considered in cosmological analyses using the big bang model. For instance, the time at which it is predicted that many of the light elements were first formed in the process of primordial nucleosynthesis is of the order of 1 s, some 35 orders of magnitude later in time. First-order phase transitions. As Guth demonstrated, the effects of such a phase transition in the early universe could be profound. In order to calculate the dynamics of a phase transition, it is necessary to follow the behavior of the relevant order parameter for the transition. This is done by determining the energy (actually the thermodynamic free energy in the case of systems at finite temperatures) of a system as a function of the order parameter, and following the changes in this energy as the temperature changes. Typically (Fig. 2), at some high temperature T, the minimum of the relevant energy function is at zero value of the order parameter, the vacuum expectation value of a certain field. Thus the ground state of the system, which occurs when this energy is a minimum, will be the symmetric ground state. This case is analogous to the spin system, which at high temperatures is disordered, so that no preferred direction is picked out. As the temperature is decreased, however, at a certain critical temperature Tc a new minimum of the energy appears at a nonzero value of the order parameter. This is the symmetry-breaking ground state of the system. See FREE ENERGY. How the system makes a transition between the original, symmetric ground state and the new ground state depends on the shape of the energy curve as a function of the order parameter. If there is a barrier between the two minima (Fig. 2), then classically the system cannot make a transition between the two states. However, it is a well-known property of quantum mechanics that the system can, with a certain very small probability, tunnel through the barrier and arrive in the new phase. Such a transition is called a
T > Tc
free energy V (φ)
T = Tc
T < Tc
order parameter φ Fig. 2. Free energy as a function of the order parameter for the case of a first-order transition that is typical of symmetry breaking in grand unification theories.
first-order phase transition. Because the probability of such a tunneling process is small, the system can remain for a long time in the symmetric phase before the transition occurs. This phenomenon is called supercooling. When the transition finally begins, “bubbles” of new phase locally appear throughout the original phase. As more and more bubbles form, and the original bubbles grow (in the case of such a phase transition in particle physics the bubbles grow at the speed of light), eventually they combine and coalesce until all of the system is in the new phase. Another example of this type of phenomenon is the case of a supercooled liquid such as water that has been continually stirred as it has been cooled. At a certain point, ice crystals will spontaneously form throughout the volume of water, and grow until all of the water has turned to ice. See NONRELATIVISTIC QUANTUM THEORY. Inflation. Such a phase transition, when treated in the context of an expanding universe, can result in remarkable behavior. The dynamics of an expanding universe are governed by Einstein’s equations from the theory of general relativity. When these equations are solved by using the properties of ordinary matter, it is seen that the rate of expansion of the universe, characterized by the Hubble parameter H— which gives the velocity of separation of objects— slows down over time in a way that is dependent on the temperature of the universe, T. (Specifically, in a radiation-dominated universe, H is dependent on T 2.) This slowdown occurs because the energy density of matter, which is driving the expansion, is decreased as the matter becomes more dilute because of the effects of expansion. However, once the temperature is below the critical temperature for the transition (Fig. 2), the metastable symmetric phase has a higher energy than the new lower-energy symmetry-breaking phase. Until the transition occurs, this means that the symmetric phase has associated with it a large constant energy density, independent of temperature. When this constant energy density is placed on the right-hand side of Einstein’s equations, where the energy density of matter appears, it is found that the resultant Hubble parameter describing expansion is a constant. Mathematically, this implies that the scale size of the universe increases exponentially during this supercooling phase. This rapid expansion is what is referred to as inflation. Once the phase transition is completed, the constant energy density of the original phase is converted into the energy density of normal matter in the new phase. This energy density decreases with expansion, so that the universe reverts to the slower rate of expansion characteristic of the standard big bang model. See HUBBLE CONSTANT; RELATIVITY. Successes of inflation. Guth pointed out that a period of exponential expansion at some very early time could solve a number of outstanding paradoxes associated with standard big bang cosmology. Flatness problem. In 1979, it was pointed out that present observations seem to imply that either the present era is a unique time in the big bang
151
152
Inflationary universe cosmology expansion or the initial conditions of expansion had to be fine-tuned to an incredible degree. The observation involves the question of whether the universe is now open, closed, or flat. A closed universe is one where there is sufficient mass in the universe to eventually halt and reverse the observed expansion because of net gravitational attraction; an open universe will go on expanding at a finite rate forever; and a flat universe forms the boundary between these two cases, where the expansion rate will continually slow down and approach zero asymptotically. Measurements of the observed expansion rate, combined with measurements of the observed mass density of the universe, yield a value for the density parameter that rapidly approaches 0 for an open universe, infinity for a closed universe, and is exactly equal to 1 for a flat universe. All measurements of yield values between about 0.1 and 2. What is so strange about this measurement is that theory suggests that once the value of deviates even slightly from 1, it very quickly approaches its asymptotic value far away from 1 for open or closed universes. Thus, it is difficult to understand why, after 1010 years of expansion, the value of is now so close to 1. At a quantitative level, the problem is even more mysterious. In order to have its present value in the range given above, at the time of nucleosynthesis the value of would have had to have been equal to 1 within 1 part in 1015. Inflation naturally explains why should exactly equal 1 in the observable universe today. Einstein’s equations for an expanding universe can in the cases of interest be written in the form of the equation below, where t is the time after beginning of ex1+
K = (t) R(t)2 ρ(t)
pansion, R is the cosmic scale factor, related to the size of the universe in closed universe models, ρ is the energy density of matter, and K is a constant that is equal to 0 for flat universe models. Normally, for an expanding universe, the energy density of matter decreases with scale at least as fast as 1/R3, so that the left-hand side deviates from 1 as time goes on. This is the quantitative origin of the flatness problem. However, in an inflationary phase, ρ(t) remains constant, while R increases exponentially. Thus, during inflation (t) is driven arbitrarily close to 1 within the inflated region. If there are some 28 orders of magnitude of exponential expansion, which is possible if supercooling lasts for some time, then (t) need not have been finely tuned to be close to 1, even if inflation occurred as early as the Planck time (10−45 s). Horizon problem. An equally puzzling problem that inflationary cosmology circumvents has to do with the observed large-scale uniformity of the universe. On the largest observable scales, the universe appears to be largely isotropic and homogeneous. In particular, the 3-K microwave radiation background, which in different directions has propagated from distances separated by 1010 light-years, is known to be uniform
in temperature to about 1 part in 100,000. See COSMIC BACKGROUND RADIATION. This observed uniformity may not seem so puzzling, until an attempt is made to derive it in the context of any set of reasonable initial conditions for the big bang expansion. Physical effects can propagate at best at the speed of light. The distance a light ray could have traveled since the big bang explosion, which is also thus the farthest distance at which one object can affect another, is called the horizon. The horizon size increases linearly with time, since light travels with a constant velocity even in an expanding universe. Thus, the size of the horizon today, which is about 1010 light-years, is much larger than the horizon size at the time the radiation in the microwave background was emitted. In particular, in the standard big bang model the sources of the radiation observed coming from opposite directions in the sky were separated by more than 90 times the horizon distance at the time of emission. Since these regions could not possibly have been in physical contact, it is difficult to see why the temperature at the time of emission was so uniform in all directions. Even if some very isotropic initial conditions are postulated for the big bang expansion to account for this uniformity, a quantitative problem similar to the flatness problem is encountered in attempting to account for the degree of uniformity now observed. Small fluctuations in energy density tend to grow as the universe evolves because of gravitational clumping. Indeed, observed clumping on the scale of galaxies and smaller attests both to this fact and to the existence of some initial irregularities in the distribution of matter. However, since irregularities grow but the universe is relatively smooth on large scales, any initial conditions for the big bang expansion imposed at early times, say 10−45 s, would have to involve an absurdly uniform distribution of matter on the largest scales. Inflation solves the horizon problem very simply. If the observed universe expanded by 28 orders of magnitude in a short period at a very early time, then in inflationary cosmology it originated from a region 1028 times smaller than the comparable region in the standard big bang model extrapolated back beyond that time. This makes it quite possible that at early times the entire observed universe was contained in a single horizon volume (Fig. 3). Problems with “old” inflation. The list of cosmological problems resolved by the original inflationary universe model is quite impressive. Moreover, its origin in current elementary-particle-physics ideas made it more than an artificial device designed specifically to avoid cosmological paradoxes. Unfortunately, the original inflationary scenario was fundamentally flawed. The central problem for this scenario was how to complete the phase transition in a uniform way. The phase transition began by the formation of bubbles of one phase nucleating amidst the initial metastable phase of matter. While these bubbles grow at the speed of light once they form, the space between the bubbles is expanding exponentially. Thus, it is extremely difficult for bubbles to
1050 1040 1030 1020 1010 1 10−10 10−20 10−30 10−40 10−50
radius of observable universe (standard big bang)
10−45
153
horizon size (inflation)
horizon size (standard big bang) radius of observable universe (inflation) 10−35
10−25
10−15 time, s
10−5
105
1015 present
Fig. 3. Horizon size and radius of the observable universe as functions of time after beginning of expansion, in the standard big bang and inflationary models. ( After A. H. Guth and P. J. Steinhardt, The inflationary universe, Sci. Amer., 250(5):116–128, May 1984)
new inflation the observable universe grew from a single inflating bubble, problems of inhomogeneity and topological defects resulting from bubble percolation and collisions are avoided, at least as far as observational cosmology is concerned. Moreover, the problem of completing the phase transition is naturally avoided. Once the order parameter in the inflating region approaches the steep part of the potential, the large constant energy density of the initial phase is reduced, and inflation slows. This energy density is converted into a form of kinetic energy of the changing order parameter. This motion of the order parameter is reflected in a changing charge of the new ground state of matter. Just as a time-varying electric charge produces electromagnetic radiation, so the variation of the order parameter produces a thermal background of real particles. By the time the order parameter has settled at the new minimum of the potential, all of the original constant energy density of the symmetric phase of matter has been converted into energy of real particles at a finite temperature. (Any particle density that existed before inflation has, of course, been diluted away by the vast expansion of the region during inflation.) The evolution from this point on is that of the standard
T > Tc T = Tc T < Tc
free energy V (φ)
eventually occupy all of space, as is required for the completion of the transition. Moreover, even if bubbles are forming at a constant rate, each region of space quickly becomes dominated by the largest bubble, which formed earliest. Collisions with much smaller bubbles in this region will not adequately or uniformly dissipate the energy of this largest bubble, so that even if the transition does manage to percolate through space, the final state will be very nonuniform on all scales. Finally, topologically stable objects such as magnetic monopoles and domain walls can form at the intersection of bubbles, and end up with densities after inflation well above those allowed by observation. See MAGNETIC MONOPOLES. While these problems motivated the development of a “new” inflationary scenario, described below, in 1989 it was pointed out that in certain previously examined special models related to so-called Brans-Dicke models of gravitation, the gravitational constant can itself vary during an inflationary phase transition. In this case, the inflationary expansion rate, while very fast, need not be exponentially fast, so that bubbles growing at the speed of light can eventually percolate throughout all of space, thereby completing the phase transition and ending the inflationary expansion. However, many details of these rather special scenarios, which may still leave large remnant inhomogeneities on the scale where bubbles coalesce, remain to be worked out. See GRAVITATION. “New” inflationary cosmology. In 1981 a new approach was developed by A. D. Linde, and independently by A. Albrecht and Steinhardt, which has since become known as new inflation. They suggested that if the energy function (potential; Fig. 2) were slightly changed, then it might be possible to maintain the successful phenomenology of the old inflationary model while avoiding its problems. In particular, they considered a special form of the potential that is extremely flat at the origin (Fig. 4). Most important, such functions have essentially no barrier separating the metastable from the stable phase at low temperature. Now, when the universe cooled down below the critical temperature, the order parameter could continuously increase from zero instead of tunneling discretely to a large nonzero value. As long as the potential is sufficiently flat near the origin, however, it can take a long time before the order parameter approaches its value at the true minimum of the potential (a so-called slow-rollover transition), at which point the region will be in the new phase. During this time the region of interest can again be expanding exponentially because of the large constant energy density that is maintained while the order parameter remains near the origin. Thus, in some sense a single bubble can undergo inflation in this scenario. If the amount of inflation in this region is sufficient, the whole observable universe could have originated inside a single inflating bubble. This key difference between old and new inflationary cosmology accounts for the ability of the latter to bypass the problems for the former. Because in
distance, cm
Inflationary universe cosmology
order parameter φ Fig. 4. Free energy as a function of the order parameter for the case of a so-called slow-rollover transition that appears in the new inflationary cosmology.
154
Inflationary universe cosmology big bang model, with the initial conditions being a uniform density of matter and radiation at some finite temperature. Problems with new inflation. New inflation, too, is not without its problems. The type of potential (Fig. 4) needed for a new inflationary scenario is not at all as generic as that postulated in the old inflationary model (Fig. 2). To have such a slow-rollover transition, the parameters of particle physics models must be finely tuned to some degree. Moreover, the temperature to which the universe is reheated after inflation must be large enough that, at the very least, big bang nucleosynthesis can proceed afterward. This again rules out some possible models. No clear candidate model for new inflation has emerged from particle physics. Another potential problem for any inflationary scenario concerns initial conditions. As discussed above, if an inflationary phase precedes the standard big bang expansion, then it is possible to resolve problems of the standard big bang model related to the unphysical fine tunings that seem necessary at time zero in order for the big bang to evolve into its presently observed form. However, there is also the question of how generic such an inflationary phase is—namely, how special the initial preinflationary conditions must be so that space-time will undergo an inflationary transition in the first place. If these too are unphysical, then inflation may not have really solved any fundamental problems. While Guth suggested in his initial work that the preconditions for inflation were not severe, this view has been questioned by some. Attempts have been made to develop an existence proof that not all initial configurations of the universe can be made isotropic by inflation. This, of course, does not address the fundamental issue of whether such configurations are at all generic. In 1983, Linde proposed a version of inflation, which he called chaotic inflation, that may in principle address this issue. In particular, he argued that inflation may have occurred very generally in the early universe, even if there was no related phase transition. As long as a scalar field existed with a potential whose scale of energy was sufficiently close to the Planck scale, then, if quantum fluctuations carried the field expectation value far enough away from the origin, the field would relax slowly enough that its energy density would remain nonzero and roughly constant for some time, causing an inflationary phase of expansion. Such a possibility led Linde to suggest that the early universe may have been arbitrarily inhomogeneous. In some regions, inflation may have successfully taken place, even without the spontaneous symmetry breaking associated with a grand unified theory, and in other regions it may not have. He then suggested that the regions in which inflation did take place are in fact most probable. In particular he argued that those values of parameters that allow large amounts of inflation to take place are most probable, if an initial random distribution of regions in the preinflation universe governed by different metric parameters is allowed. In addition,
he pointed out that life would form only in those regions that became sufficiently isotropic so that, if many different regions of the universe now exist, it is not surprising that humans live in one that has undergone inflation. This argument, a reformulation of the so-called anthropic principle, is, however, difficult to quantify. In any case, the issue of initial conditions for inflation is the subject of much research, but may require an understanding of quantum gravity for its eventual resolution. See CHAOS; QUANTUM GRAVITATION. Predictions. Whether or not there exists a specific model for new inflationary cosmology, and whether or not understanding inflation may require further refinements in the understanding of quantum gravity, the implications of an inflationary phase in the early universe are profound. As discussed above, many of the fundamental paradoxes of standard big bang cosmology can be resolved. Moreover, it has been demonstrated that new inflationary cosmology naturally allows a derivation from first principles of the spectrum of primordial energy density fluctuations responsible for galaxy formation. Before new inflation, there existed no method, even in principle, for deriving this spectrum for the standard big bang model; it always had to be input as an artificial addition. Remarkably, the spectrum that emerges from inflationary models is a so-called scale-invariant spectrum of perturbations. It is exactly this type of spectrum that had been postulated 10 years before the formulation of the new inflationary model to account for the observed properties of galaxies while maintaining agreement with the observed anisotropy of the microwave background. In a remarkable discovery, the first observation of anisotropies in the microwave background was announced in April 1992, based on the analysis of 2 years of data on the microwave background structure using the differential microwave radiometer experiment aboard the Cosmic Background Explorer (COBE) satellite launched by the National Aeronautics and Space Administration (NASA) in 1989. A quadrupole anisotropy in the background was observed at a level of about 5 × 10−6, just in the range that might be expected for primordial fluctuations from inflation that might also result in the observed distribution of galaxies. Moreover, the correlation of temperature deviations observed across the sky on scales greater than about 10◦ is remarkably consistent with the scale-invariant spectrum predicted by inflationary models. While neither of these observations conclusively proves the existence of an inflationary phase in the early universe, the fact that they are clearly consistent with such a possibility, and at present with no other scenario, gives great confidence that inflationary models may provide the correct description of the universe in the era preceding the present observed isotropic expansion. Theoretical and observational factors. Since the early 1990s several factors, both theoretical and observational, have dominated thinking about possible inflationary models.
Inflationary universe cosmology Hybrid inflation. As the theoretical notion of supersymmetry—a symmetry relating the two different kinds of elementary particles in nature, bosons (integer-spin particles) and fermions (half-integerspin particles)—has begun to dominate the model building of elementary particle physics, the kind of flat potentials which allow long slow rollover periods of inflation have become more widely used. The problem in such models is not how to have sufficient inflation but rather how to end inflation. A new mechanism, which can be relatively natural in some supersymmetric models, has been proposed in this regard. Called hybrid inflation, this mechanism relies not on the evolution of a single-field order parameter but on several interacting fields. In this case, the evolution of these fields as the universe evolves can be more complex. In certain cases, the preferred, lowest free-energy state of matter can evolve along the flat direction, but as the value of the corresponding field evolves, its couplings to other fields can, at a certain critical value, cause the lowest free-energy state to evolve along a nonflat direction, abruptly ending inflation. The advantages of these models are that inflation can end without fine tuning, primordial density fluctuations that are of an acceptable magnitude can be generated without a great deal of fine tuning, and the models are reasonably well motivated on particle physics grounds. Unfortunately, model building in this area is still sufficiently murky that no obvious set of phenomenologically acceptable models yet exists. See SUPERSYMMETRY. It has also been proposed that the generation of the observed matter-antimatter asymmetry in nature might not occur at temperatures corresponding to the grand unified scale, where the symmetry breaking which motivated Guth’s original model is thought to occur, but at much lower energies, corresponding to the scale where the symmetry between the weak and electromagnetic interactions breaks. In this case, it may be possible that following inflation the universe thermalizes at a temperature far below that associated with the original inflationary models. This relaxes some of the constraints on inflationary models, and again favors the possible existence of long flat directions in the potential that might be associated with supersymmetric models. See ANTIMATTER; ELECTROWEAK INTERACTION; STANDARD MODEL; WEAK NUCLEAR INTERACTIONS. Preheating. Original calculations suggested that, following inflation, the universe would quickly thermalize to a temperature comparable to the energy density stored in the original symmetric preinflationary phase of matter. Such a scenario was called reheating. Subsequent calculations have shown that a very different behavior can take place. As the order parameter relaxes about the new minimum of the potential following inflation, its vibrations can release energy in certain nonthermal modes. As a result, immediately following inflation a finite fraction of the released energy might exist in such nonthermal modes. Thus, estimates of particle number densities based on the assumption of thermal equilibrium
may be dramatically wrong. Among the possible consequences of this situation, called preheating, is the generation of a large density of topological defects such as cosmic strings, and possibly magnetic monopoles in the postinflationary phase. Since one of the original purposes of inflation was ridding the universe of such defects, it is not yet clear what the implications of this new possibility are for the health of inflationary models. See COSMIC STRING. Dark energy, branes, and a flat universe. Finally, since 1995 it has been increasingly clear, based on measurements of the total clustered mass in the universe in galaxies and clusters of galaxies, that there is not sufficient matter to result in a flat universe today. However, it is now recognized that matter accounts for only part of the total energy density of the universe, with the remainder being associated with some new form of energy, possibly energy stored in the ground state of empty space, called a cosmological constant. This energy may in fact be identical to the energy that is stored during the inflationary phase itself. If this is true, we live in an inflationary universe today. Measurements of the expansion rate of the universe as a function of cosmic time, by observing the recession velocities of certain types of exploding stars, called type 1a supernovae, in very distant galaxies, argue that this is precisely the case, with 30% of the energy of a flat universe in matter and 70% in dark energy. It will probably be necessary to rethink many of the current microphysical notions about the expanding universe to accommodate this newly discovered dark energy. See ACCELERATING UNIVERSE; COSMOLOGICAL CONSTANT; DARK ENERGY; SUPERNOVA. A great deal of theoretical work has been devoted to exploring the possibility that our fourdimensional universe might be embedded in a universe of higher dimensions. In such models—in which our four-dimensional universe, which forms a structure called a brane, can be impacted upon by higher-dimensional physics—the possibility exists that inflation might be caused by new physics associated with other branes embedded in higherdimensional spaces, instead of being associated with phase transitions due purely to physics on our brane. However, these ideas remain highly speculative. See SUPERSTRING THEORY. Results and prospects. The next-generation advanced cosmic microwave background satellite, the Wilkinson Microwave Anisotropy Probe (WMAP), has been able to probe many of the fundamental parameters of cosmology, including the Hubble expansion rate, the overall matter density, the spectrum of primordial density fluctuations, and the possible existence of a cosmological constant, to an accuracy of better than 10%. WMAP has confirmed many of the generic predictions of inflation, including a flat universe and a roughly constant spectrum of primordial fluctuations. Even more significant perhaps, it achieved sufficient accuracy to determine that the spectrum is not precisely scale-invariant, but changes with wavelength very slightly in a way that is precisely
155
156
Inflorescence that expected by many inflationary models. (Inflation predicts only a roughly scale-invariant spectrum when detailed estimates are performed. The fluctuations spectrum actually increases slightly with increasing wavelength.) While this is not proof that inflation actually occurred, the consistency between theoretical expectations and observations provides further support for this idea. Moreover, detailed measurements of the spectrum of primordial fluctuations can actually rule out some inflationary models. See WILKINSON MICROWAVE ANISOTROPY PROBE. While all of the current data supports the idea of inflation, because many different inflationary models exist, the question remains whether there is a prediction that can be probed that is unique to inflation. Interest currently is focused on the possibility of measuring a primordial spectrum of gravitational waves, which is also generated during an inflationary phase of the early universe. The spectrum of these waves would also be close to scale-invariant, and the amplitude of the waves would depend upon the energy scale at which inflation occurred. Such a spectrum might produce a measurable polarization in the cosmic microwave background at a level that might be probed by Planck, the next-generation cosmic microwave background satellite, or by future ground-based detectors. A positive signal could provide very strong evidence for inflation, but unfortunately the absence of a signal would not provide definitive evidence against it. See GRAVITATIONAL RADIATION. For the moment, inflation remains a beautiful and well-motivated theoretical possibility, which is now more strongly supported by all existing data, and in fact is the only such idea that is consistent with all existing cosmological observations. Hopefully, theoretical and observational developments will help settle the question of whether or not it actually occurred. See BIG BANG THEORY; COSMOLOGY; UNIVERSE. Lawrence M. Krauss Bibliography. C. L. Bennett, M. S. Turner, and M. White, The cosmic Rosetta stone, Phys. Today, 50(11):32–38, November 1997; A. H. Guth, Inflationary universe: A possible solution to the horizon and flatness problems, Phys. Rev., D23:347–356, 1981; A. H. Guth and P. J. Steinhardt, The inflationary universe, Sci. Amer., 250(5):116–128, May 1984; L. Krauss, Cosmological antigravity: The case for a cosmological constant, Sci. Amer., 282(1):52–59, January 1999; L. Krauss, Quintessence: The Mystery of the Missing Mass in the Universe, 2000; A. D. Linde, Eternally existing self-reproducing chaotic inflationary universe, Phys. Lett., B175:395–400, 1986; A. D. Linde, The inflationary universe, Rep. Prog. Phys., 47:925–986, 1984; D. N. Spergel et al., First-year Wilkinson Microwave Anisotropy Probe (WMAP) observations: Determination of cosmological parameters, Astrophys. J. Suppl., 148:175–194, 2003; D. N. Spergel et al., Wilkinson Microwave Anisotropy Probe (WMAP) three year results: Implications for cosmology, Astrophys. J., in press, 2006.
Inflorescence A flower cluster segregated from any other flowers on the same plant, together with the stems and bracts (reduced leaves) associated with it. Certain plants produce inflorescences, whereas others produce only solitary flowers. The stalks of the individual flowers are the pedicels. Inflorescences are easily identified in certain species, though not in others. For example, it is often difficult, if not arbitrary, to determine whether flowers on a plant are sufficiently segregated from each other to be interpreted as solitary flowers or components of one or more inflorescences. In other cases, such as species of Solidago (goldenrod), a plant produces flowers in small clusters conventionally designated as inflorescences, but these inflorescences are aggregated into a very conspicuous inflorescencelike cluster at the tip of the plant. Branching. Patterns of branching in inflorescences, as well as other portions of the plant, are classified under two general categories, monopodial and sympodial. Distinctions between these categories are not absolute, and a given inflorescence may exhibit one or both patterns. In monopodial branching, each branch is subordinate to that which bears it, and stems may bear numerous lateral branches (including flowers; illus. a and o). In sympodial branching, each branch is more or less equivalent to the branch which bears it, generally terminates in a flower, and tends to be short, bearing one or a few lateral branches (illus. h–n). In some cases of sympodial branching, a structure which superficially appears as a single branch (that is, one derived from a single apical meristem) is actually composed of several branches. Classification. Inflorescences are classified according to gross appearance at maturity or pattern of branching. Depending on the species, an inflorescence may be regarded as belonging to one or more types; for example, the inflorescence of Limnocharis flava (velvet leaf) is properly classified as an umbel (based on gross appearance), yet also as a compound monochasium or bastryx (based on branching pattern). Types of inflorescences include: raceme (illus. a), composed of a monopodial central axis, along the sides of which arise pedicel-bearing flowers; corymb (illus. b), a raceme whose pedicels are proportionally elongated so as to bring all the flowers more or less to the same level; umbel (illus. c), composed of several flowers which radiate from almost the same point; compound umbel (illus. d), composed of several branches which radiate from almost the same point, each branch terminating in an umbellike flower cluster; spike (illus. e), composed of flowers which lack or virtually lack pedicels, arranged singly or in contracted clusters along a central axis; catkin (ament; illus. f), a slender, usually scaly spike bearing unisexual, apetalous flowers, which often hangs downward and is deciduous as a whole; spadix (illus. g), a spike with a thickened fleshy axis, usually enveloped by one or more spathes (large modified leaves); dichasium (simple dichasium; illus. h),
Inflorescence
(c) (b)
(d) flower
(h)
(a) flower
bract
pedicel (g)
(f)
5
5 3
3
5
4
2 2 1 (m)
(n)
(e) (l)
(k′)
5
2
4
3
4
3 2
2
bract
5 4
5
1
3
3
(l′)
2
1
1
(m′)
(o)
3
1
1
5 1
5
4
2
4
4
2
3
(k)
peduncle
(j) 1
4
bract (i)
(n′)
(q)
(p)
(r)
Examples of inflorescence types. All are side views except k , l , m , and n which are top views. Numbers in k–n indicate flowers in order of development. (a) Raceme. (b) Corymb. (c) Umbel. (d) Compound umbel. (e) Spike. (f) Catkin. (g) Spadix (fruiting, Ludovia integrifolia). (h) Dichasium. (i) Compound dichasium. (j) Monochasium. (k, k) Bostryx. (l, l) Cincinnus. (m, m) Rhipidium. (n, n) Drepanium. (o) Panicle. (p) Verticillate inflorescence. (q) Cyme. (r) Head.
a three-flowered cluster composed of an axis which bears a terminal flower and, below it, two bracts which subtend lateral flowers; compound dichasium (illus. i), composed solely of dichasia; monochasium (simple monochasium; illus. j) a two-flowered cluster composed of an axis which bears a terminal flower and, below it, one bract which subtends a lateral flower; compound monochasium (illus. k–n),
composed solely of monochasia; bostryx (helicoid cyme; illus. k and k), a compound monochasium or modified compound dichasium in which successive pedicels describe a spiral; cincinnus (scorpiod cyme, illus. l and l), like a bostryx, but successive pedicels follow a zigzag path; rhipidium (illus. m and m), a bostryx compressed so that all flowers lie in one plane; drepanium (illus. n and n), a cincinnus
157
158
Influenza compressed so that all flowers lie in one plane; panicle (illus. o), composed of a monopodial central axis which bears flowers indirectly on branches of higher order and which also may bear some flowers directly; thyrse, a compact panicle of more or less cylindrical form; verticillate inflorescence (illus. p), like either a raceme or panicle, but the central axis bears flowers in whorls, approximate whorls, or contracted clusters which simulate whorls; cyme (illus. q), a compound, more or less flat-topped sympodial inflorescence, sometimes simulating a compound umbel but having branches less regularly disposed; and head (capitulum; illus. r), compact inflorescence having a very short, often discoid or globose flower-bearing portion. Formerly, the aforementioned inflorescence types were classified under two main categories, indeterminate (racemose) and determinate (cymose); however, this classification is now obsolete. See FLOWER. George J. Wilder Bibliography. H. C. Bold et al., Morphology of Plants and Fungi, 1980; H. W. Rickett, The classification of inflorescences, Bot. Rev., 10:187–231, 1944; H. W. Rickett, Materials for a dictionary of botanical terms, III: Inflorescences, Bull. Torrey Bot. Club, 82:419–445, 1955.
Influenza An acute respiratory viral infection characterized by fever, chills, sore throat, headache, body aches, and severe cough; it is commonly known as the flu. While many viruses can cause respiratory infections such as the common cold, influenza viruses are more likely to cause severe illness and to result in serious medical complications. Although gastrointestinal symptoms may sometimes accompany influenza infection, especially in children, the term “stomach flu” is a misnomer for gastrointestinal illnesses caused by a variety of other viruses, bacteria, or other agents. See VIRUS. Another unique feature of influenza is its distinctive seasonal pattern. Outbreaks of influenza occur during cold weather months in temperate climates, and typically most cases cluster during a period of 1–2 months, in contrast to broader periods of circulation with many other respiratory viruses. During annual influenza epidemics, an increased death rate due to influenza-related complications is often observed, and is one of the ways that public health authorities monitor epidemics. During an average influenza season in the United States, more than 200,000 people are hospitalized and 36,000 die as a result of such complications. See PUBLIC HEALTH. Viral agents. The bacterium Hemophilus influenzae was once mistakenly considered the cause of influenza, based on finding the microbe in the sputum of individuals with the disease. However, viruses were shown to be the cause in 1933, when the first influenza virus, designated type A, was isolated. The second major type, designated type B, was identified in 1940, and the third, type C, in 1949.
The three types of influenza viruses are classified in the virus family Orthomyxoviridae, and they are similar, but not identical, in structure and morphology. Types A and B are more similar in physical and biologic characteristics to each other than they are to type C; and unlike type C viruses, they are responsible for widespread illness during epidemics. Influenza viruses may be spherical or filamentous in shape, and they are of medium size among the common viruses of humans. There are two important viral protein antigens on the virion surface, hemagglutinin and neuraminidase. Hemagglutinin functions in the attachment of the virus to cell receptors and initiates infection by anchoring the virus to respiratory epithelial cells. Neuraminidase is an enzyme that facilitates release of virus from infected cells and aids in the spread of virus in the respiratory tract. See ANTIGEN. Influenza type A viruses are divided into subtypes based on the hemagglutinin and neuraminidase proteins. There are 16 different hemagglutinin subtypes and 9 neuraminidase subtypes. Currently there are two influenza A subtypes circulating among humans, but all hemagglutinin and neuraminidase subtypes have been found in wild birds, which are believed to be the natural reservoir for all influenza A subtypes. Influenza type B does not have an animal reservoir and is not divided into subtypes. Antigenic drift and shift. Influenza type A and B viruses undergo changes over time due to an accumulation of point mutations in the viral proteins, especially hemagglutinin and to a lesser extent neuraminidase. This is referred to as antigenic drift, and necessitates changes to at least one of the three virus strains in the influenza vaccine almost every year. These changes are also responsible for the fact that people remain susceptible to influenza infections during their lives, since antibodies produced in response to infection with a particular influenza strain may not recognize the mutated viruses and therefore may provide little or no protection against them as the strains continue to mutate. Influenza viruses that currently cause epidemics among humans are influenza types A(H1N1), A(H3N2), and B. See MUTATION. Influenza A viruses may also undergo sudden changes known as antigenic shift, which results in a virus with a new hemagglutinin, neuraminidase, or both. When this occurs, a new virus subtype emerges to which most or all people have no immunity, resulting in a worldwide influenza epidemic, or pandemic. Influenza pandemics occurred in 1918–1919, 1957– 1958, and 1968–1969. The 1918–19 pandemic (the Spanish influenza pandemic) was by far the most severe, and is estimated to have caused 20–40 million deaths worldwide with 675,000 in the United States alone. The reasons for the unusually severe illnesses and high death rates during this pandemic have long puzzled influenza researchers, and for many years little was known about the characteristics of the causative virus. However, in recent years a team of scientists has recovered influenza virus genetic material (RNA) from formalin-fixed autopsy samples
Influenza taken from soldiers who died of influenza during the 1918–1919 pandemic and from frozen lung tissues of a person who died of influenza and was buried in the permafrost (perennially frozen ground) in Alaska in 1918. The genetic material recovered from these sources has been used to reconstruct the 1918 Spanish influenza pandemic strain. Studies of this reconstructed virus will greatly enhance the understanding of that pandemic virus, and perhaps will help scientists better detect viruses with pandemic potential and develop strategies to help mitigate the effects of pandemic influenza in the future. See EPIDEMIC. Animal influenza. In addition to infecting humans, strains of influenza A infect many other species, including pigs, horses, seals, whales, and a wide variety of avian species. The severity of influenza infection in animals ranges from asymptomatic to fatal. Animal influenza viruses are believed to play an important role in the emergence of human pandemic strains. One way that antigenic shift can occur is when a human and an animal influenza A virus strain infect the same host. When a cell is infected by two different type A viruses, various combinations of the original parental viruses may be assembled into the new progeny; thus, a progeny virus may be a mixture of gene segments from each parental virus and therefore may gain a new characteristic, for example, a new hemagglutinin. Hence a new virus is introduced into a population that has no immunity against it. Since pigs are susceptible to infection with both human and avian influenza strains, this type of antigenic reassortment could occur when a pig is simultaneously infected with human and avian influenza strains. This can happen in settings where people, pigs, and a variety of avian species live in proximity, a circumstance common in many parts of the world. Since in recent years it has been observed that humans can also become infected with some strains of avian influenza viruses, it is also theoretically possible for reassortment to occur if a human is simultaneously infected with an avian and a human influenza virus. The first human illnesses associated with an outbreak of a highly pathogenic A(H5N1) virus in birds were reported in 1997. This virus was widespread among poultry throughout live bird markets in Hong Kong, and led to 18 known human illnesses and 6 deaths. Further spread of this virus in Hong Kong was prevented by destroying 1.5 million birds. In 2003 the A(H5N1) virus was discovered in poultry in Vietnam and has since spread to other countries in Asia, Europe, and Africa, where it has been found in poultry and wild birds. By July 2006 a total of 231 cases in humans had been confirmed in 10 countries, of which 133 were fatal. Although most human cases of “bird flu” are believed to have been contracted by contact with infected birds, it is possible that the virus might change, either by reassortment or mutation of the viral genes, and become easily transmitted from human to human, causing a pandemic. This has prompted governments and health agencies
to rush development of vaccines and to develop pandemic preparedness plans as a precaution, as well as to intensify surveillance of this and other avian influenza viruses with similar potential, and to destroy infected poultry when outbreaks are detected. See ANIMAL VIRUS. Pathogenesis. When influenza virus enters the respiratory tract, usually by inhalation of aerosolized (suspended in air) viruses from an infected person, the virus hemagglutinin attaches to epithelial cells and invades the cells. After attachment, the cell’s normal defense mechanisms actually help the virus gain entry and replicate. The cell engulfs the virus and attempts to destroy it with enzymes. However, instead of destroying the virus, the enzymes allow the viral RNA proteins to spill into the cell and move to the cell nucleus. Replication of viral RNA begins rapidly in the nucleus of the host cell. Final assembly of new virus occurs at the plasma membrane, and new viruses spread to nearby cells. The virus has a short incubation period. There is only a period of 1–3 days between infection and illness, and this leads to the abrupt development of symptoms that is a hallmark of influenza infections. The virus is typically shed in the throat for 5–7 days, during which time the sick person can infect others. Complete recovery from uncomplicated influenza usually takes several days to a week, and the individual may feel weak and exhausted for a week or more after the major symptoms disappear. Numerous medical complications can occur as a result of influenza infection. Pulmonary complications include secondary bacterial and primary viral pneumonias. Viral pneumonia is far less common than bacterial pneumonia, but the mortality rate is much higher. However, because it occurs much more frequently, bacterial pneumonia is responsible for many more hospitalizations and deaths during influenza epidemics. Other pulmonary complications of influenza include croup in infants and young children and worsening of chronic lung conditions. Even previously healthy people can show alterations in pulmonary function that can last for weeks after the initial infection. Influenza can also cause inflammation of the heart muscle (myocarditis) or of the sac around the heart (pericarditis), and may worsen chronic heart conditions. It can also cause encephalitis (inflammation of the brain), kidney failure, and Reye’s syndrome, which affects the liver and central nervous system. Reye’s syndrome occurs almost exclusively in children, and has been associated with the use of aspirin to treat influenza (as well as chickenpox). For this reason it is recommended that aspirin substitutes such as acetaminophen be used when treating children for suspected viral infections. See ASPIRIN; HERPES; PNEUMONIA. Prevention and control. Vaccination is the primary means of preventing influenza. Because at least one of the three virus strains in the influenza vaccine is changed during most years due to viral mutations, and because vaccine-induced antibody can wane over the course of a year, the vaccine must be taken
159
160
Information management every year before the influenza season. Currently there are two types of influenza vaccines. One is made from inactivated, or killed, viruses, and the other from living viruses that have been attenuated, or weakened. Anyone over 6 months of age can take the inactivated vaccine, but the live attenuated vaccine is recommended only for those between the ages of 5 and 49 years who are not pregnant. The inactivated vaccine is administered by injection, but the live attenuated vaccine is delivered as a nasal spray. With both vaccines, antibodies develop within about 2 weeks of vaccination, but the live virus vaccine may better stimulate the cell-mediated immune response and provide broader and longer-lasting immunity, especially in children and younger adults. How effective the vaccine is in preventing influenza also depends on how healthy the immune system of the recipient is, and on how closely the virus strains in the vaccine resemble those that circulate during the influenza season. Since the vaccine strains must be chosen well in advance of the influenza season, one or more viruses may continue to mutate to an extent that causes the vaccine to be less effective. However, even when this happens, people who have been vaccinated usually have a shorter and less severe illness if they are infected, and are less likely to develop complications than people who were not vaccinated. Antiviral medications can be used to prevent or to treat influenza. These drugs act by inhibiting replication of the virus and, to prevent influenza, must be taken daily during the period of possible exposure. The drugs can also decrease the severity and lessen the duration of the illness if taken soon after symptoms begin. Two of these drugs, amantadine (Symmetrel) and rimantadine (Flumadine), classed as M2 inhibitors (since they inhibit viral protein M2, which uncoats the virus’s protein shell, a process needed for viral replication in the host cell), are effective only against type A viruses. While the drugs were used effectively for many years, influenza viruses can develop resistance during treatment, and in recent years so many resistant strains have been detected that the two are no longer the drugs of choice. Newer drugs, Osletamivir (Tamiflu) and Zanamivir (Relenza), classified as neuraminidase inhibitors, are used to treat influenza with relatively few problems of resistance to date. These drugs are active against both type A and type B viruses. See BIOTECHNOLOGY; VACCINATION. John M. Quarles; Nancy H. Arden Bibliography. J. A. Cabezas et al., New data on influenza virus type C confirm its peculiarities as a new genus, Intervirology, 32:325–326, 1991; R. B. Couch et al., Influenza: Its control in persons and populations, J. Infect. Dis., 153:431–440, 1986; J. M. Katz, Preparing for the next influenza pandemic, ASM News, 70:412–419, 2004; E. D. Kilbourne, Influenza, 1987; M. W. Shaw, N. H. Arden, and H. F. Maassab, New aspects of influenza viruses, Clin. Microbiol. Rev., 5:74–92, 1992; T. M. Tumpey et al., Characterization of the reconstructed 1918 Spanish influenza pandemic virus, Science, 30:77–80, 2005.
Information management The functions associated with managing the information assets of an enterprise, typically a corporation or government organization. Increasingly, companies are taking the view that information is an asset of the enterprise in much the same way that a company’s financial resources, capital equipment, and real estate are assets. Properly employed, assets create additional value with a measurable return on investment. Forward-looking companies carry this view a step further, considering information as a strategic asset that can be leveraged into a competitive advantage in the markets served by the company. In some respects, the term “information management” is interchangeable with management information systems (MIS); the choice of terminology may reflect the attitude of a company regarding the value of information. Companies using the traditional MIS descriptor are likely to hold the view that management information systems represent a cost to be minimized, in contrast to the strategic-asset view with the objective of maximizing return on investments. For the large enterprise with multiple business units and functional elements distributed over a broad geographic area, information management can be an enormously complex task. Its successful accomplishment is dependent not only on the diligent application of professional skills, but even more on leadership and management abilities strongly rooted in a thorough knowledge of and insight into the enterprise. Scope. The scope of the information management function may vary between organizations. As a minimum, it will usually include the origination or acquisition of data, its storage in databases, its manipulation or processing to produce new (value-added) data and reports via application programs, and the transmission (communication) of the data or resulting reports. While many companies may include, with good reason, the management of voice communications (telephone systems, voice messaging, and, increasingly, computer-telephony integration or CTI), and even intellectual property and other knowledge assets, this article focuses primarily on the data aspects of information management. Information versus data. Executives and managers frequently complain of “drowning in a flood of data, but starving for information.” There is a significant difference between these two terms. Superficially, information results from the processing of raw data. However, the real issue is getting the right information to the right person at the right time and in a usable form. In this sense, information may be a perishable commodity. Thus, perhaps the most critical issue facing information managers is requirements definition, or aligning the focus of the information systems with the mission of the enterprise. The best technical solution is of little value if the final product fails to meet the needs of users.
Information management Information engineering. One formal approach to determining requirements is information engineering. By using processes identified variously as business systems planning or information systems planning, information engineering focuses initially on how the organization does its business, identifying the lines from where information originates to where it is needed, all within the context of a model of the organization and its functions. While information systems personnel may be the primary agents in the information engineering process, success is critically dependent on the active participation of the end users, from the chief executive officer down through the functional staffs. An enterprise data dictionary is a principal product of the information engineering effort. This dictionary is a complete catalog of the names of the data elements and processes used within the enterprise along with a description of their structure and definition. The dictionary may be part of a more comprehensive encyclopedia, which is typically a formal repository of detailed information about the organization, including the process and data models, along with descriptions of the design and implementation of the organization’s information systems applications. A major advantage of the application of information engineering is that it virtually forces the organization to address the entire spectrum of its information systems requirements, resulting in a functionally integrated set of enterprise systems. In contrast, ad hoc requirements may result in a fragmented set of systems (islands of automation), which at their worst may be incompatible, contain duplicate (perhaps inconsistent) information, and omit critical elements of information. Information systems architecture. Information engineering develops a logical architecture for an organization’s information systems. Implementation, however, will be within the context of a physical systems architecture. Information engineering ensures functional integration, whereas systems engineering provides for the technical integration of information systems. Together, functional and technical (logical and physical) integration lay the architectural foundation for the integrated enterprise. See SYSTEMS ARCHITECTURE; SYSTEMS ENGINEERING. Open versus proprietary architectures. Many critical decisions are involved in establishing a physical systems architecture for an organization. Perhaps the single most important decision is whether the architecture will be based on a single manufacturer’s proprietary framework or on an open systems framework. Although the former may be simpler to implement and may offer less technical risk, it is far more restrictive with regard to choices of equipment. An open systems architecture allows for the integration of products from multiple vendors, with resulting increased flexibility in functional performance and the potential for significant cost savings in competitive bidding. The flexibility to incorporate new products is particularly important in an era of rapid technological advancement,
where new technology may become obsolete in 18 months. Client-server and network computing. One of the most widely used architectural models is the clientserver model. In this model, end users, for example, may employ personal computers as clients to request data from an enterprise database resident at another location on a minicomputer or mainframe (the server). Clients and servers interconnect and communicate with one another over networks, from local-area networks (LANs) within a single facility, to wide-area networks (WANs) which may span one or more continents. A variation on this model (sometimes referred to as network computing) employs “thin” clients, with the applications running on servers accessed over the network. See CLIENTSERVER SYSTEM; DISTRIBUTED SYSTEMS (COMPUTERS); LOCAL-AREA NETWORKS; WIDE-AREA NETWORKS. Operating systems. An extremely important architectural consideration is the choice of operating systems. This selection may determine the options available in acquiring off-the-shelf software packages. Alternatively, an existing (legacy) set of software application programs (normally representing a major investment) may severely constrain the choice of hardware on which the programs will run. The issue here is whether the operating systems are proprietary to a single vendor’s hardware or are vendorindependent (such as UNIX). Because of the portability it offers, UNIX (and more recently a derivative of UNIX called LINUX) has become a de facto standard in the industry. Although it is actually proprietary, Microsoft’s Windows family of operating systems has also achieved de facto standard status due to its increasingly ubiquitous availability on most major hardware platforms. See OPERATING SYSTEM. Databases. Another major architectural decision is the structure of the databases to be utilized. Most database products follow a hierarchical, relational, or object-oriented paradigm. Although the most widely used structure has been relational, increasing attention is being given to object-oriented systems. Information engineering tends to separate data from applications, while object-oriented methods couple the two. See DATABASE MANAGEMENT SYSTEM; OBJECTORIENTED PROGRAMMING. Internet and related technologies. The rapid rise of the Internet since the mid-1990s has profoundly impacted information systems architectures as well as the role of information systems within an organization. With tens of millions of individuals, corporations, and even governments tied in to the Internet, there are few businesses that do not reach out to their customer base via a Web site, thus expanding their information systems architecture to encompass the extended enterprise. Equally important, the same technology that drives the Internet (the Internet Protocol or IP) has made possible intranets, which function like the Internet but are private to the organization. The corporate Web site involves the marketing and public relations organizations as major information systems users, and even the sales organization may be involved, as actual sales are made over the Web,
161
162
Information processing (psychology) giving rise to “electronic commerce” (e-commerce). See INTERNET; WORLD WIDE WEB. Software applications. The information engineering process can produce a complete set of requirements for the applications needed to run the enterprise. When all architectural parameters have been determined, unique applications can be developed, preferably employing computer-aided software engineering (CASE) tools to automate the process to the maximum degree possible. In general, however, the cost and time involved in developing custom software, and the even greater cost of supporting (maintaining) that software throughout its life cycle, dispose strongly toward the maximum use of off-theshelf software packages. See SOFTWARE ENGINEERING. There are many functions that are common to almost any organization, such as personnel (human resources) and finance. Applications that support these and other basic infrastructure functions, such as manufacturing, are readily available off-the-shelf from major vendors in the form of enterprise resource planning (ERP) software packages that have become a major element of the software industry. Although considerable work is required either to configure the vendor’s software to match the organization’s internal processes, or to alter the processes to match the software, or a combination of the two, the net cost in time and dollars (including maintenance) often makes this option more attractive than developing custom applications from scratch. Functions that are unique to an organization may justify the cost and risk associated with custom software development, but these decisions should not be made without careful analysis. Strategic applications that produce a compelling competitive advantage are most likely to meet this test. Enterprise application integration. The proliferation of Web and e-commerce applications, as well as the trend toward enterprise resource planning systems, has posed an additional major challenge for information managers: successfully integrating these new applications with the enterprise’s legacy systems, or enterprise application integration (EAI). See INFORMATION SYSTEMS ENGINEERING; SYSTEMS INTEGRATION. Alan B. Salisbury Bibliography. S. Alter, Information Systems: A Management Perspective, 1999; J. Martin, Information Engineering, 1989; A. Sage, Systems Management for Information Technology and Software Engineering, 1996; T. Wheeler, Open Systems Handbook, 1992.
Information processing (psychology) The coding, retrieval, and combination of information in perceptual recognition, learning, remembering, thinking, problem solving, and performance of sensory-motor acts. Coding. Information is coded in some type of memory by having something in the memory represent the information. In books, recording tapes, and
almost all digital computer memories, the memory consists of a long one-dimensional series of memory locations. The encoding of a piece of information such as the proposition “John plays tennis” consists of imposing the patterns representing “John,” “plays,” and “tennis” into adjacent locations in the memory. Any other piece of information, such as the proposition “Tennis is a sport,” learned at a different time would be coded in a similar manner in a totally different set of locations in memory. The same pattern would be used to represent the concept “tennis,” which is common to both propositions, but this pattern would be stored in two separate places, and an information processor using the memory would not have direct access to both facts about “tennis” when it had retrieved one fact. An additional search of memory would be required. Such a memory is called nonassociative, because there are no direct associations linking all instances of the same concept or idea in memory. Human memory is not at all like a tape recorder or computer memory. Human memory is associative. From a coding standpoint, the defining property of an associative memory is the use of specific node encoding, in which each idea is represented only once by a designated node (location). The representative of an idea is a particular location in memory, rather than a particular pattern that can be, and often is, stored in a variety of locations. There are many different types of associative memories; two of these are considered below: horizontal associative memory and vertical associative memory. In a horizontal associative memory, the coding of propositions consists of forming a strong association directly from each concept node to the following concept node (Fig. 1). From a semantic viewpoint, the association from one node to another means “is followed by.” Careful study of Fig. 1 reveals the two primary deficiencies of a horizontal associative memory. First, there is no encoding of where one proposition ends and another begins, a difficulty which might be referred to as “run-on propositions.” Second, there is a serious associative interference problem. For example, there is no way to know whether John plays tennis, basketball, or both, and similarly for Peter. The ability of an associative memory to integrate information that involves the same concepts can produce excessive and erroneous integration in the case of a horizontal associative memory. Vertical associative memories solve these problems while retaining the desirable feature of specific node encoding. They associate atomic concepts to higher-order chunk (phrase) concepts. These associations mean that the lower-order concept “is a constituent of” the higher-order (phrase or propositional) concept. All of the concepts in a single proposition are associated directly or indirectly to a single higher-order node that represents the entire proposition (Fig. 2). In this way both the runon proposition and associative interference problems are solved by a hierarchical tree-structure encoding of exactly which concepts belong in each proposition.
Information processing (psychology) Associations are assumed to be unidirectional, that is, from one node to another. Information processing in an associative memory requires the ability to get from a chunk node to its constituents, and vice versa. Semantically, the associations from a chunk to its constituents mean “has as a constituent.” Since the meaning of vertical associations in the two directions is logically opposite, it seems likely that these associations are represented in the mind (brain) by structurally dissimilar types of links that function somewhat differently when information is retrieved from a person’s memory. Many theories of associative memory assume an even larger variety of links than this, but at least these two types appear to be necessary. Order and context-sensitive coding. There are many chunks whose constituents have no unique ordering, such as the members of a family or the features of many objects. Although the sentences used to communicate propositions to others have a definite word order, the same proposition can be communicated using virtually any ordering of the constituent concepts, provided the proper syntactic plan is used. Semantically, there does not appear to be any unique ordering of the constituents of a proposition. Hence, it is likely that the encoding of propositions is another example of the encoding of an unordered set. However, there are numerous examples of chunks encoded in the mind whose constituents are uniquely ordered, for example, the episodes of a story or the letters in a word. How are these ordered sets coded? A number of possibilities have been suggested, but the one that appears to have the most evidence supporting it is context-sensitive coding. According to this theory, the constituents of a chunk representing an ordered set such as the word “clank” are nodes representing overlapping triples of the component segments. In the case of “clank” the constituents would be “#Cl,” “cLa,” “lAn,” “aNk,” and “nK#.” The reason for distinguishing the middle element in each triple is that it is the central or focal element and the two on each side are the context. In speech production, the node “lAn” would control the speech musculature to produce the particular “a” that is preceded by “l” and followed by “n.” If horizontal associations are added to the constituents of a chunk as a third type of link in vertical associative memory, then horizontal associations between pairs of context-sensitive nodes will produce a unique ordering of constituents when there is one and no such ordering when there is not. In the “clank” example, sufficient information has been encoded to know the first constituent of the set, given that the entire set is active in the mind due either to bottom-up activation by stimulus presentation of the word “clank” or to top-down activation of the segmental constituents by the word node for “clank.” In both recognition and production of the work “clank,” there is only one segmental node of the form “#Xy.” Since “#” represents the junction between words, this must be the first node in the word. Given that “#Cl” is the first node, “cLa” will have the strongest horizontal association
tennis
John
is
163
a sport
plays Peter
basketball
Fig. 1. Horizontal associative memory with a serious associative interference problem (who plays tennis—John, Peter, or both?) and no encoding of where each proposition begins and ends.
from “#Cl,” so one knows it is the second node, and so forth. Learning. All learning in an associative memory can be considered to be the strengthening of different associations between nodes. Thinking of a set of ideas at about the same time (within approximately a second) activates the nodes representing the ideas at about the same time. In a way that is not understood, this strengthens vertical associations gluing these idea nodes to higher-order chunk nodes and probably also strengthens horizontal associations among the idea nodes. If a set of idea nodes is simultaneously activated, then all horizontal associations are equally strong on the average (all orders are encoded), and no unique ordering is encoded. If the set of idea nodes is activated in a simple linear order, than a unique set of horizontal associations will be encoded to represent the sequence (ordered set). For certain purposes it may be useful to distinguish among three types of learning: chunking (forming
J. p. t. t. is a s.
p. t. is a s.
John
tennis
is
a sport
plays Peter
basketball
p. b.
P. p. b. Fig. 2. Vertical associative memory where concepts are grouped into phrases and propositions by associating each concept to a higher-order node that represents the entire group.
Information processing (psychology) new vertical associations from a set of old nodes to a new node, specifying that node to represent the set of old nodes); sequencing (forming new horizontal associations between adjacent pairs of nodes in an ordered set); and strengthing (increasing the strength of existing associations). Forming new associations refers to the initial increase in strength above the baseline, or zero, value, so all learning consists of the strengthening of associations in the most general sense. However, the initial strengthening of associations appears to require more thinking than does the further strengthening of previously learned associations. Chunking and sequencing are greatly facilitated by coding new material so as to integrate it with stored material as much as seems relevant and as consistently as possible. When such integration uses relationships that are true and of considerable general value, it is called understanding. When such integration uses fictional relationships or relationships of little general value, it is called a mnemonic trick. Either way, new learning is made easier by integration with prior knowledge. The difference is that all associations formed or strengthened by understanding are true and useful, as opposed to only a few of the associations formed or strengthened by mnemonic tricks. Further strengthening of learned associations requires much less thinking effort—just retrieval of the associations to be rehearsed. However, this does not mean that such review is of no value. Review can serve to counteract the effects of prior forgetting, to overlearn so as to retard future forgetting, to increase retrieval speed for the reviewed material in the future, and to reduce the attentional demands of such retrieval. Retrieval. Information that is currently being thought about is said to be in active memory. All of the rest of the information that is stored in long-term memory is said to be in passive memory. Retrieval refers to the cognitive processes that allow one to make use of stored information by converting the memory traces for particular pieces of information from the passive to the active state. Retrieval can be analyzed into two components: recognition, which is the activation of the nodes in memory that encode the cue information. Every act of retrieval requires recognition of the retrieval cues, and probably every act of retrieval involves recall of some information beyond that given in the retrieval cues. The basic information-retrieval process in an associative memory is direct-access, that is, direct activation of the nodes encoding the cue and closely associated information. No search is required when the retrieval cues are sufficient to identify a unique memory node (location). When the retrieval cues are inadequate, there is sometimes information stored in long-term associative memory that would be adequate to retrieve a unique answer to the question. In such cases, a sequence of direct-access retrieval processes is initiated. For example, sometimes the associations from an acquaintance’s face to his or her name is too weak to retrieve the name from the
face cues. However, when the backward associations from the name to the face cues are added in, the information stored in memory is sufficient to recognize which name goes with that face. So one can engage in a memory search process that consists of recalling various names and judging the strength of association of each name with the face. The entire sequence is a search process, but each elementary act of retrieval in this complex sequence is a directaccess process. The whole purpose of having an associative memory is to make direct-access retrieval possible, by encoding every idea in one specific location that can be identified from the cue input without having to search through sets of alternative locations. Retrieval dynamics. While retrieval from memory is direct, it is not instantaneous. One elementary act of retrieval requires approximately 1 s of time to reach completion, with a range from 0.3 to 3 s. Retrieval can be analyzed into three temporal phases: latent, transition, and terminal (Fig. 3). The latent phase extends from the time the retrieval cues are first presented until the initial time (I) that any response can be made which is at all influenced by the memory traces activated by the retrieval cues. The terminal phase is the period of time after retrieval is complete during which the retrieved set of memory traces is fully activated and able to control behavior and thought. In the terminal phase, retrieved memory strength is equal to the limit set by the passive strength level stored in long-term memory (L). The transition phase is the period in between, when retrieved memory strength increases from zero to L. This transition phase is not abrupt, as if retrieval occurred all at once. Instead, memory retrieval appears to occur incrementally over a period of many tenths of a second. If people respond to retrieval cues after some retrieval has occurred, but before the retrieval of the memory trace is complete, they respond at a level of accuracy which is greater than chance but less than they could achieve had they delayed their response until retrieval was complete. A plot
retrieved memory strength (S )
164
S = L (1− e −R [ T−I ]) 4
L1 = 4 L2 = 3.5
95% accuracy 3
R=2 2 1
0
I = .4 s .5
T1 = 1.3 s T2 = 1.9 s 1 1.5 2 retrieval time (T ), s
2.5
3
Fig. 3. Memory-retrieval functions for conversion of stored long-term memory strength L into active short-term memory strength S as a function of retrieval time T. Note that the time to achieve 95% accuracy is greatly affected by stored strength L, even when retrieval dynamics parameters (I + R) are identical.
Information systems engineering of accuracy as a function of reaction time is called a speed-accuracy tradeoff function. The accuracy of a recognition or recall response to retrieval cues measures the strength of the retrieved memory trace. The increase in the strength of the retrieved memory trace as a function of the time allowed for processing the retrieval cues (retrieval time) is called the retrieval function. Empirical speed-accuracy tradeoff functions for recognition or recall thus provide measures of the theoretical retrieval functions for these tasks. The dynamics of memory retrieval appears to be closely approximated by an exponential approach to a limit L at rate R after the initial delay time I (Fig. 3). Exponential approach to a limit means that in every very small increment of time (after delay time I), a constant percentage of what stored trace remains to be retrieved is retrieved. That percentage is the retrieval rate R. In Fig. 3, two retrieval functions are displayed that have identical retrieval dynamics (identical I and R parameters) but different limits set by their different levels of stored strength in long-term memory (L1 and L2). Note that the time required for memory retrieval to reach a certain level of accuracy (95% in Fig. 3) is considerably less for the trace with the greater strength in storage (T1 < T2). Thus, even without altering the dynamics of memory retrieval, one can decrease reaction time considerably in recall or recognition simply by increasing the strength of the stored trace in long-term memory, with no loss in accuracy. Retrieval dynamics (I and R parameters) is not affected by memory strength. It is primarily affected by the hierarchical level of encoding of the memory trace being retrieved—the higher the coding level (that is, the longer the chain of upward vertical associations from the sensory periphery to the highest level chunk node), the slower the retrieval dynamics (that is, the greater the I value and the smaller the R value). Thus, the retrieval dynamics for recognizing letters is faster than that for words, which is in turn faster than that for sentences. The more simply one codes material in terms of hierarchical depth, the faster it can be retrieved, provided the strength of the trace is the same in both cases. Attention and parallel processing. There are limits to how many things a person can attend to at once. The laws of attention are not completely understood, but the following appear to be true: A very large number of retrieval cues can be processed at once, provided they all converge on a single stored memory trace, regardless of complexity, which has been previously unified by vertical associations to a single chunk node at the top of its trace hierarchy; this is called convergent parallel processing. Two or three separate sets of retrieval cues can probably be processed, provided that the traces they retrieve are very strong; this is called divergent parallel processing. Divergent parallel processing is easier if the traces are stored in different memory modalities, for example, verbal versus spatial. Thus, a person can carry on a conversation and sign letters simultaneously, with little loss of effectiveness in either task, because the memory
traces are in somewhat different modalities and the signature trace is very strong. One of the important aspects of increasing skill in sports, for instance, is the increasing ability to perform several informationprocessing tasks in parallel. Part of this skill is due simply to increased practice at each component skill and part is due to explicit coordination of these components under unified, higher-order chunk nodes. See COGNITION; MEMORY. Wayne A. Wickelgren Bibliography. J. R. Anderson, Cognitive Psychology and Its Implications, 4th ed., 1999; K. Haberlandt, Cognitive Psychology, 2d ed., 1996; D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing, 1987; A. Trehub, The Cognitive Brain, 1991.
Information systems engineering The process by which information systems are designed, developed, tested, and maintained. The technical origins of information systems engineering can be traced to conventional information systems design and development, and the field of systems engineering. Information systems engineering is by nature structured, iterative, multidisciplinary, and applied. It involves structured requirement analyses, functional modeling, prototyping, software engineering, and system testing, documentation, and maintenance. Information systems. Modern information systems solve a variety of data, information, and knowledgebased problems. In the past, most information systems were exclusively data-oriented; their primary purpose was to store, retrieve, manipulate, and display data. Application domains included inventory control, banking, personnel record keeping, and the like. The airline reservation system represents the quintessential information system of the 1970s. Since then, expectations as to the capabilities of information systems have risen considerably. Information systems routinely provide analytical support to users. Some of these systems help allocate resources, evaluate personnel, and plan and simulate large events and processes. This extremely important distinction defines the range of application of modern information systems. The users expect information systems to perform all the tasks along the continuum shown in Fig. 1. In the 1980s the applications expanded to data-oriented and analytical computing. Systems engineering. Systems engineering is a field of inquiry unto itself. There are principles of applied systems engineering, and a growing literature defines a field that combines systems analysis, engineering, and economics. Systems engineering extends over the entire life cycle of systems, including requirement definitions, functional designs, development, testing, and evaluation. The systems engineer’s perspective is different from that of the product engineer, software designer, or technology developer. The product engineer deals with detail, whereas the systems engineer takes an overall viewpoint. Where the product engineer deals with internal operations,
165
166
Information systems engineering Data-Oriented Computing Physical tasks file store retrieve sample
Communicative tasks instruct inform request query
Analytical Computing Perceptual tasks search identify classify categorize
Mediational tasks plan evaluate prioritize decide
Analytical Complexity Continuum Fig. 1. Data-oriented and analytical computing, suggesting the range of information systems applications.
the systems engineer deals more extensively with the external viewpoint, including relations to other systems, users, repairers, and managers. Systems engineering is based upon the traditional skills of the engineer combined with additional skills derived from applied mathematics, psychology, management, and other disciplines. The systems engineering process is a logical sequence of activities and decisions that transform operational needs into a description of system performance configuration. See SYSTEMS ENGINEERING. Figure 2 is a flowsheet for the design, development, and testing of information systems. The process is by nature iterative and multidisciplinary. Modern information systems problems are frequently analytical rather than data-oriented, and therefore require an iterative design, as the feedback loops in Fig. 2 suggest. Requirements modeling and prototyping. The first and most important step is the identification of user requirements, on which software requirements are based; the two are by no means the same. Requirements analysis may be based on questionnaires and surveys, interviews and observation, and simulations and games. Questionnaire and survey-based requirements analysis includes the use of importance-rating questionnaires, time-estimate questionnaires, and Delphi and policy capture techniques. Questionnaire and survey methods assume that insight into user requirements can be obtained from indirect queries. Interview and field observation requirements analysis assumes that requirements can be determined by asking users what they do and how, and by observing them at the tasks that the system is to perform. These methods include the use of structured and unstructured interviews, the ad hoc convening of working groups, the use of critical incident techniques, and formal job analysis. Simulation and gaming methods include the use of scenario-driven simulations of the functions of the system. Some simulations are paper-based, others are computer-based, and still others combine human and computer-based aspects of a problem. See SIMULATION. Requirements analysis extends to the users of the system and the organization it is intended to support. Figure 3 presents a three-dimensional requirements matrix comprising tasks, users, and organizational
characteristics needed by information systems engineers. Identified requirements should be modeled for validation; Fig. 2 suggests several methods. Popular techniques offer users a working demonstration of the fully programmed system. This is referred to as prototyping and may result in evolutionary and temporary systems. When requirements can be specified with some confidence, it is often possible to develop an evolutionary prototype that can be improved with time. When requirements are poorly defined, prototypes may be discarded repeatedly until requirements are clarified. See MODEL THEORY; PROTOTYPE. System sizing. Once requirements have been modeled and validated, the information system concept can be specified. It requires formulation of a user– computer relation that will determine the user– system interaction, the specification of databases, the selection of analytical methods to drive dataoriented or analytical tasks, the identification of pertinent engineering principles, and the specification of hardware. See COMPUTER. An important decision is the selection of a programming language. This may be determined by existing investments, experience of personnel, and limitations of hardware. Sometimes, however, the choice of language can be based upon requirements identified early in the process. Comparative programming languages is an emerging field that helps to make the choice. The growing range of high-level languages includes FORTRAN, C, Basic, COBOL, LISP, Prolog, Pascal, and Ada. See PROGRAMMING LANGUAGES. Software engineering. Several techniques are available for the specification of software requirements and programming. There are data-flow, datastructure, and object-oriented tools and techniques, and a variety of methods for the generation of knowledge structures, modules, interfaces, and control structures. See DATAFLOW SYSTEMS; SOFTWARE ENGINEERING. Testing, documentation, and maintenance. The information systems engineering process calls for testing throughout design and development. Of special importance is the testing of the software to determine if it satisfies user requirements, is consistent with specifications, and works efficiently. Quality assurance, fault tolerance, and redundancy techniques are available. See FAULT-TOLERANT SYSTEMS. Information systems documentation consists of system specifications, functional descriptions, user manuals, and training materials. System testing determines if the pieces of the system work harmoniously together; multiattribute utility and traditional cost–benefit methods are available. Systems are tested to determine whether they satisfy user requirements, are cost-effective, and can be maintained. Information systems engineers also develop maintenance plans comprising schedules, procedures, and personnel. Implementation. The introduction of a new information system into a computer-free environment can
Information systems engineering Steps
Products
identify user requirements select elicitation method(s)
questionnaire/survey interview/observation simulation/gaming
define user requirements model user requirements
task profiles user profiles organizational profiles narratives flowcharting storyboarding evolutionary throwaway hybrid
design and develop prototype conduct requirements conference
structured unstructured
design and develop refined prototype prototype system sizing
evolutionary throwaway hybrid multiattribute utility cost-benefit hybrids user- computer interaction data/knowledge base specification analytical methods selection software engineering hardware configuration
system concept specify software requirements
data flow-oriented data structure-oriented object-oriented
define hardware requirements identify hardware/ software configuration
processor options input device options output device options benefit models cost-benefit models optimization models data/knowledge structures modules/ interfaces control structures
develop software
test software
quality assurance fault tolerance redundancy system specification functional description user manual training manual
document system
test system
multiattribute utility cost-benefit hybrids
maintain system Fig. 2. Information systems engineering process, with steps, methods, and products.
organization task schedules evaluations
167
"Physical"
filing retrieving copying collating sorting
Procedural
form filling document checking
Social
telephoning dictating conferring meeting
Cognitive
Information technology
TASKS
168
data analysis calculation planning decision making
Closed industry Open industry Flexible military Naive
Managerial
Scientifictechnical
Fixed military
NS
INE
TR
C - DO
IO AT NIZ
GA
OR
USERS Fig. 3. Three-dimensional requirements matrix comprising information about tasks, users, and the organization that the system is intended to support.
create personnel problems. The information systems engineer adheres to procedures for the management of change developed by organizational theorists and sociologists. Support. Information systems must be supported by an infrastructure group. This group is responsible for defining a maintenance and support process and providing the necessary tools to keep the application running and to plan for its modification, and, eventually, its replacement, triggering a whole new instance of the information systems engineering process. Stephen J. Andriole Bibliography. S. J. Andriole, Managing Systems Requirements: Methods, Tools, and Cases, McGrawHill, 1996; S. J. Andriole, Storyboard Prototyping for Systems Design: A New Approach to User Requirements Validation and System Sizing, 1989; R. S. Pressman, Software Engineering: A Practitioner’s Approach, McGraw-Hill, 5th ed., 2000; A. P. Sage, Systems Engineering, Wiley-Interscience, 1992; A. Solvberg and D. C. Kung, Information Systems Engineering: An Introduction, Springer Verlag, 1993; K. M. Van Hee, Information Systems Engineering: A Formal Approach, Cambridge University Press, 1994.
Information technology The field of engineering involving computer-based hardware and software systems, and communication systems, to enable the acquisition, representation, storage, transmission, and use of information. Successful implementation of information technology
(IT) is dependent upon being able to cope with the overall architecture of systems, their interfaces with humans and organizations, and their relationships with external environments. It is also critically dependent on the ability to successfully convert information into knowledge. Information technology is concerned with improvements in a variety of human and organizational problem-solving endeavors through the design, development, and use of technologically based systems and processes that enhance the efficiency and effectiveness of information in a variety of strategic, tactical, and operational situations. Ideally, this is accomplished through critical attention to the information needs of humans in problem-solving tasks and in the provision of technological aids, including electronic communication and computer-based systems of hardware and software and associated processes. Information technology complements and enhances traditional engineering through emphasis on the information basis for engineering. Tools. Information technology at first was concerned with the implementation and use of new technologies to support office functions. These technologies have evolved from electric typewriters and electronic accounting systems to include very advanced technological hardware to perform functions such as electronic file processing, accounting, and word processing. Among the many potentially critical information technology–based tools are databases, e-mail, artificial intelligence systems, facsimile transmission (FAX) devices, fourth-generation programming languages, local-area networks (LANs), integrated service digital networks (ISDNs), optical
Information technology disk storage (CD-ROM) devices, personal computers, parallel processing algorithms, word processing software, computer-aided software engineering packages, and accounting software. Management information systems and decision support. The development of support through management information systems (MIS) became possible through the use of mainframe computers. These systems have become quite powerful and are used for a variety of purposes, such as scheduling airplane flights and booking passenger seats, and registering university students in classes. As management information systems began to proliferate, it soon was recognized that, while they were very capable of providing support for organizing data and information, they did not necessarily provide much support for tasks involving human judgment and choice. These tasks range from providing support in assessing situations, such as better detection of issues or faults, to supporting diagnosis in order to enable the identification of likely causative or influencing factors. This capability was provided by linking the database management systems (DBMS) so common in the MIS era with a model base management system (MBMS) capability and a visualization and interactive presentation capability made possible through dialog generation and management systems (DGMS). The resulting systems are generally known as decision support systems (DSS). Continuing developments in microchip technology have led to electronic communications– based networking, a major facet of information technology. Many would argue that the major information technology development in recent times is the Internet. See DECISION SUPPORT SYSTEM; INFORMATION SYSTEMS ENGINEERING. Activities. The knowledge and skills required in information technology come from the applied engineering sciences, especially information, computer, and systems engineering sciences, and from professional practice. Professional activities in information technology and in the acquisition of information technology systems range from requirements definition or specification, to conceptual and functional design and development of communication and computer-based systems for information support. They are concerned with such topics as architectural definition and evaluation. These activities include integration of new systems into functionally operational existing systems and maintenance of the result as user needs change over time. This human interaction with systems and processes, and the associated information processing activities, may take several diverse forms. See REENGINEERING; SYSTEMS ARCHITECTURE; SYSTEMS ENGINEERING. The hardware and software of computing and communications form the basic tools for information technology. These are implemented as information technology systems through use of systems engineering processes. While information technology and information systems engineering does indeed enable better designs of systems and existing organizations, it also enables the design of fundamentally new organizations and systems such as virtual corporations.
Thus, efforts in this area include not only interactivity in working with clients to satisfy present needs but also awareness of future technological, organizational, and human concerns so as to support transition over time to new information technology–based services. Systems integration. Often, it is very difficult to cope with the plethora of new information technology–based support systems. The major reason is the lack of systems integration across the large variety of such products and services. This has led to the identification of an additional role for information technology professionals, one involving support through information systems integration engineering. An information systems integration engineer is responsible for overall systems management, including configuration management, to ensure that diverse products and services are identified and assembled into total and integrated solutions to information systems issues of large scale and scope. There are many contemporary technological issues here. There is a need for open systems architectures, or open systems environments, that provide, for example, for interoperability of applications software across a variety of heterogeneous hardware and software platforms. The key idea here is the notion of open, or public, that is, intended to produce consensus-based developments that will ameliorate difficulties associated with lack of standards and the presence of proprietary interfaces, services, and protocols. See SYSTEMS INTEGRATION. Distributed collaboration. In this network age of information and knowledge, a major challenge is to capture value. Associated with this are a wide range of new organizational models. Distributed collaboration across organizations and time zones is increasingly common. The motivation for such collaboration is the desire to access sources of knowledge and skills not usually available in one place. The result of such changes has been a paradigm shift that has prompted the reengineering of organizations; the development of high-performance business teams, integrated organizations, and extended virtual enterprises; as well as the emergence of loosely structured organizations that have enhanced productivity. See DATA COMMUNICATIONS; INTERNET; LOCAL-AREA NETWORKS; WIDE-AREA NETWORKS. Organizational challenges. There are a number of substantial challenges associated with use of information technology with respect to enhancing the productive efforts of an individual, a group, or an organization. It has been suggested that the command and control model of leadership is poorly suited to the management of organizations where the participants are not bound by traditional incentive and reward systems. A collaborative effort has to continue to make sense and to provide value to participants for it to be sustained. Otherwise, knowledge and skills are quite portable, and the loss of knowledge workers poses a major potential risk for organizations. Decrease in costs. The information technology revolution is driven by technology and market
169
170
Information theory considerations. Information technology costs declined in the 1990s due to the use of such technologies as broadband fiber optics, spectrum management, and data compression. The power of computers continues to increase; the cost of computing declined by a factor of 10,000 during from 1975– 2000. Large central mainframe computers have been augmented, and in many cases replaced, by smaller, powerful, and more user-friendly personal computers. There has, in effect, been a merger of the computer and telecommunications industries into the information technology industry, and it now is possible to store, manipulate, process, and transmit voice, digitized data, and images at very little cost. Benefits. As a consequence of the information technology revolution, information and knowledge have become powerful factors for socioeconomic development on a global scale. This knowledge has the potential to provide comprehensive support for enhanced production of goods and services, educational and employment opportunities for all peoples, institutions and infrastructures that enable better management in the private sector and governance in the public sector, natural resource conservation and environmental preservation, and global sustainable development. There are a number of complex issues that require attention, such as the impact of the information technology revolution on social, ethical, cultural, and family values. Most importantly, the benefits of the information technology revolution are overwhelmingly large and lead to new opportunities for world progress through the dissemination of sustainable development knowledge. Indeed, perhaps sustainable human development is not a realistic possibility at this time without major reliance on the support provided by information technology. Andrew P. Sage Bibliography. B. H. Boar, Aligning Information Technology with Business Strategy, John Wiley, New York, 1994; S. P. Bradley and R. L. Nolan, Sense and Respond: Capturing Value in the Network Age, Harvard Business School Press, Boston, 1998; F. Cairncross, The Death of Distance: How the Communications Revolution Will Change Our Lives, Harvard Business School Press, Boston, 1997; M. Dertouzos, What Will Be: How the New World of Information Will Change Our Lives, Harper Collins, San Francisco, 1997; P. F. Drucker, Managing in a Time of Great Change, Dutton, New York, 1995; R. K. Lester, The Productive Edge: How U. S. Industries are Pointing the Way to a New Era of Economic Growth, W. W. Norton, New York, 1998; D. Tapscott and A. Caston, Paradigm Shift: The New Promise of Information Technology, McGraw-Hill, New York, 1993.
Information theory A branch of communication theory devoted to problems in coding. A unique feature of information theory is its use of a numerical measure of the amount of information gained when the contents of a mes-
sage are learned. Information theory relies heavily on the mathematical science of probability. For this reason the term information theory is often applied loosely to other probabilistic studies in communication theory, such as signal detection, random noise, and prediction. See ELECTRICAL COMMUNICATIONS. Information theory provides criteria for comparing different communication systems. The need for comparisons became evident during the 1940s. A large variety of systems had been invented to make better use of existing wires and radio spectrum allocations. In the 1920s the problem of comparing telegraph systems attracted H. Nyquist and R. V. L. Hartley, who provided some of the philosophy behind information theory. In 1948 C. E. Shannon published a precise general theory which formed the basis for subsequent work in information theory. See MODULATION; RADIO SPECTRUM ALLOCATION. In information theory, communication systems are compared on the basis of signaling rate. Finding an appropriate definition of signaling rate was itself a problem, which Shannon solved by the use of his measure of information, to be explained later. Of special interest are optimal systems which, for a given set of communication facilities, attain a maximum signaling rate. Optimal systems provide communication-systems designers with useful absolute bounds on obtainable signaling rates. Although optimal systems often use complicated and expensive encoding equipment, they provide insight into the design of fast practical systems. Communication systems. In designing a one-way communication system from the standpoint of information theory, three parts are considered beyond the control of the system designer: (1) the source, which generates messages at the transmitting end of the system, (2) the destination, which ultimately receives the messages, and (3) the channel, consisting of a transmission medium or device for conveying signals from the source to the destination. Constraints beyond the mere physical properties of the transmission medium influence the design. For example, in designing a radio system only a given portion of the radio-frequency spectrum may be available. The transmitter power may also be limited. If the system is just one link in a larger system which plans to use regenerative repeaters, the designer may be restricted to pulse-transmission schemes. All such conditions are considered part of the description of the channel. The source does not usually produce messages in a form acceptable as input by the channel. The transmitting end of the system contains another device, called an encoder, which prepares the source’s messages for input to the channel. Similarly the receiving end of the system will contain a decoder to convert the output of the channel into a form recognizable by the destination. The encoder and decoder are the parts to be designed. In radio systems this design is essentially the choice of a modulator and a detector. Discrete and continuous cases. A source is called discrete if its messages are sequences of elements (letters)
Information theory taken from an enumerable set of possibilities (alphabet). Thus sources producing integer data or written English are discrete. Sources which are not discrete are called continuous, for example, speech and music sources. Likewise, channels are classified as discrete or continuous according to the kinds of signals they transmit. Most transmission media (such as transmission lines and radio paths) can provide continuous channels; however, constraints (such as a restriction to use pulse techniques) on the use of these media may convert them into discrete channels. The treatment of continuous cases is sometimes simplified by noting that a signal of finite bandwidth can be encoded into a discrete sequence of numbers. If the power spectrum of a signal s(t) is confined to the band zero to W hertz (cycles per second) then Eq. (1) applies. Equation (1) reconstructs n ∞ n sin 2π W t − 2W s(t) = s (1) n 2W n=−∞ 2πW t − 2W s(t) exactly from its sample values (Nyquist samples), at discrete instants (2W)−1 s apart. Thus, a continuous channel which transmits such signals resembles a discrete channel which transmits Nyquist samples drawn from a large finite set of signal levels and at the rate of 2W samples per second. Noiseless and noisy cases. The output of a channel need not agree with its input. For example, a channel might, for secrecy purposes, contain a cryptographic device to scramble the message. Still, if the output of the channel can be computed knowing just the input message, then the channel is called noiseless. If, however, random agents make the output unpredictable even when the input is known, then the channel is called noisy. See COMMUNICATIONS SCRAMBLING; CRYPTOGRAPHY. Encoding and decoding. Many encoders first break the message into a sequence of elementary blocks; next they substitute for each block a representative code, or signal, suitable for input to the channel. Such encoders are called block encoders. For example, telegraph and teletype systems both use block encoders in which the blocks are individual letters. Entire words form the blocks of some commercial cablegram systems. The operation of a block encoder may be described completely by a function or table showing, for each possible block, the code that represents it. It is generally impossible for a decoder to reconstruct with certainty a message received via a noisy channel. Suitable encoding, however, may make the noise tolerable, as may be illustrated by a channel that transmits pulses of two kinds. It is customary to let binary digits 0 and 1 denote the two kinds of pulse. If the source has only the four letters A, B, C, D, it is possible to simply encode each singleletter block into a pair of binary digits (code I; see table). In that case the decoder would make a mistake every time that noise produced an error. If each single-letter block is encoded into three digits (code
Three possible binary codes for four-letter alphabet Letter
Code I
Code II
Code III
A B C D
00 01 10 11
000 011 101 110
00000 00111 11001 11110
II), the decoder can at least recognize that a received triple of digits must contain errors if it is one of the triples (001, 010, 100, or 111) not listed in the code. Because an error in any one of the three pulses of code II always produces a triple that is not listed, code II provides single-error detection. Similarly, a five-binary-digit code (code III) can provide doubleerror detection, because errors in a single pulse or pair of pulses always produce a quintuple that is not listed. As an alternative, code III may provide single-error correction. In this usage, the decoder picks a letter for which code III agrees with the received quintuple in as many places as possible. If only a single digit is in error, this rule chooses the correct letter. Even when the channel is noiseless, a variety of encoding schemes exists and there is a problem of picking a good one. Of all encodings of English letters into dots and dashes, the Continental Morse encoding is nearly the fastest possible one. It achieves its speed by associating short codes with the most common letters. A noiseless binary channel (capable of transmitting two kinds of pulse 0, 1, of the same duration) provides the following example. In order to encode English text for this channel, a simple encoding might just use 27 different five-digit codes to represent word space (denoted by #), A, B, . . . , Z; say # 00000, A 00001, B 00010, C 00011, . . . , Z 11011. The word #CAB would then be encoded into 00000000110000100010. A similar encoding is used in teletype transmission; however, it places a third kind of pulse at the beginning of each code to help the decoder stay in synchronism with the encoder. The five-digit encoding can be improved by assigning four-digit codes 0000, 0001, 0010, 0011, 0100 to the five most common letters #, E, T, A, O. There are 22 quintuples of binary digits which do not begin with any of the five four-digit codes; these may be assigned as codes to the 22 remaining letters. About half the letters of English text are #, E, T, A, or O; therefore the new encoding uses an average of only 4.5 digits per letter of message. See TELETYPEWRITER. More generally, if an alphabet is encoded in singleletter blocks, using L(i) digits for the ith letter, the average number of digits used per letter is shown in Eq. (2), where p(i) is the probability of the ith letter. L = p(1)L(1) + p(2)L(2) + p(3)L(3) + · · ·
(2)
An optimal encoding scheme will minimize L. However, the encoded messages must be decipherable, and this condition puts constraints on the L(i). The code lengths of decipherable encodings must satisfy
171
172
Information theory the relationships shown in inequality (3). The real 2−L(1) + 2−L(2) + 2−L(3) + · · · ≤ 1
(3)
numbers L(1), L(2), . . . , which minimize L subject to inequality (3) are L(i) = −log2p(i) and the corresponding minimum L is shown in Eq. (4), which H=− p(i) log2 p(i) (4) i
provides a value of H equal to a number of digits per letter. The L(i) must be integers and −log2p(i) generally are not integers; for this reason there may be no encoding which provides L = H. However, Shannon showed that it is always possible to assign codes to letters in such a way that L ≤ H + 1. A procedure for constructing an encoding which actually minimizes L has been given by D. A. Huffman. For (27-letter) English text H = 4.08 digits per letter, as compared with the actual minimum 4.12 digits per letter obtained by Huffman’s procedure. By encoding in blocks of more than one letter, the average number of digits used per letter may be reduced further. If messages are constructed by picking letters independently with the probabilities p(1), p(2), . . . , then H is found to be the minimum of the average numbers of digits per letter used to encode these messages using longer blocks. See DATA COMPRESSION. Information content of message. The information contained in a message unit is defined in terms of the average number of digits required to encode it. Accordingly the information associated with a single letter produced by a discrete source is defined to be the number H. Some other properties of H help to justify using it to measure information. If one of the p(i) equals unity, only one letter appears in the messages. Then nothing new is learned by seeing a letter and, indeed, H = 0. Second, of all possible ways of assigning probabilities p(i) to an N-letter alphabet, the one which maximizes H is p(1) = p(2) = ··· = 1/N. This situation is the one in which the unknown letter seems most uncertain; therefore it does seem correct that learning such a letter provides the most information. The corresponding maximum value of H is log2N. This result seems reasonable by the following argument. When two independent letters are learned, the information obtained should be 2H = 2 log2N. However, such pairs of letters may be considered to be the letters of a larger alphabet of N2 equally likely pairs. The information associated with one of these new letters is log2 N2 = 2 log2 N. Although H given by Eq. (4) is dimensionless, it is given units called bits (a contraction of binary digits). Occasionally the information is expressed in digits of other kinds (such as ternary or decimal). Then bases other than 2 are used for the logarithm in Eq. (4). The majority of message sources do not merely pick successive letters independently. For example in English, H is the most likely letter to follow T but is otherwise not common. The source is imagined
to be a random process in which the letter probabilities change, depending on what the past of the message has been. Statistical correlations between different parts of the message may be exploited by encoding longer blocks. The average number of digits per letter may thereby be reduced below the single-letter information H given by Eq. (4). For example, by encoding English words instead of single letters, 2.1 digits/letter suffice. Encoding longer and longer blocks, the number of digits needed per letter approaches a limiting minimum value. This limit is called the entropy of the source and is interpreted as the rate, in bits per letter, at which the source generates information. If the source produces letters at some fixed average rate, n letters/s, the entropy may also be converted into a rate in bits per second by multiplying by n. The entropy may be computed from tables giving the probabilities of blocks of N letters (N-grams). If in Eq. (4) the summation index i is extended over all N-grams, then the number H represents the information in N consecutive letters. As N → ∞, H/N approaches the entropy of the source. The entropy of English has been estimated by Shannon to be about 1 bit/letter. However, an encoder might have to encode 100-grams to achieve a reduction to near 1 digit/letter. Comparing English with a source that produces 27 equally likely letters independently (hence has entropy log2 27 = 4.8 bits/letter), this result is often restated: English is 80% redundant. See ENTROPY. In computer applications, coding can reduce the redundancy of data for storage, saving space in memory. Large white areas make black-and-white diagrams very redundant. To store diagrams or send them by facsimile, coding white runs instead of individual black-and-white samples will remove much redundancy. See FACSIMILE. Universal codes. Even when a probabilistic description of the source is unavailable, coding can remove redundancy. In a simple illustration, the encoder begins by parsing a binary message into blocks, now of varying lengths. Each new block B is made as long as possible, but B must be a block B, seen earlier, followed by one extra digit x. A typical parsed message is 0,01,00,1,000, . . . , with commas separating the blocks. When transmitting the kth block B, the transmitter first sends the last digit x of B. It next sends a string of digits, of length about log2k, to identify the earlier block B. Redundancy makes block sizes tend to grow faster than log2k; then the encoding reduces message length. A. Lempel and J. Ziv have shown that codes like this one compress data about as well as codes that use given source statistics. In effect, the parsing gathers statistics to help encode future blocks. Capacity. The notion of entropy is more widely applicable than might appear from the discussion of the binary channel. Any discrete noiseless channel may be given a number C, which is called the capacity. C is defined as the maximum rate (bits per second) of all sources that may be connected directly to the channel. Shannon proved that any given source (which perhaps cannot be connected directly to the
Information theory channel) of entropy H bits/letter can be encoded for the channel and run at rates arbitrarily close to C/H letters/s. By using repetition, error-correcting codes, or similar techniques, the reliability of transmission over a noisy channel can be increased at the expense of slowing down the source. It might be expected that the source rate must be slowed to 0 bits/s as the transmission is required to be increasingly error-free. On the contrary, Shannon proved that even a noisy channel has a capacity C. Suppose that errors in at most a fraction of the letters of the message can be tolerated ( > 0). Suppose also that a given source, of entropy H bits/letter, must be operated at the rate of at least (C/H) − δ letters/s (δ > 0). No matter how small and δ are chosen, an encoder can be found which satisfies these requirements. For example, the symmetric binary channel has binary input and output letters; noise changes a fraction p of the 0’s to 1 and a fraction p of the 1’s to 0 and treats successive digits independently. The capacity of this channel is shown by Eq. (5), where m is C = m[1 + p log2 p + (1 − p) log2 (1 − p)]
(5)
the number of digits per second which the channel transmits. Shannon’s formula, shown by Eq. (6), gives the ca S C = W log2 1 + (6) N pacity C of a band-limited continuous channel. The channel consists of a frequency band W Hz wide, which contains a gaussian noise of power N. The noise has a flat spectrum over the band and is added to the signal by the channel. The channel also contains a restriction that the average signal power may not exceed S. Equation (6) illustrates an exchange relationship between bandwidth W and signal-to-noise ratio S/N. By suitable encoding a signaling system can use a smaller bandwidth, provided that the signal power is also raised enough to keep C fixed. See BANDWIDTH REQUIREMENTS (COMMUNICATIONS). Typical capacity values are 20,000 bits/s for a telephone speech circuit and 50,000,000 bits/s for a broadcast television circuit. Speech and television are very redundant and would use channels of much lower capacity if the necessary encodings were inexpensive. For example, the vocoder can send speech, only slightly distorted, over a 2000-bits/s channel. Successive lines or frames in television tend to look alike. This resemblance suggests a high redundancy; however, to exploit it the encoder may have to encode in very long blocks. Not all of the waste in channel capacity can be attributed to source redundancies. Even with an irredundant source, such as a source producing random digits, some channel capacity will be wasted. The simplest encoding schemes provide reliable transmission only at a rate equal to the capacity of a channel with roughly 8 dB smaller signal power (the 8-dB figure is merely typical and really depends on the re-
liability requirements). Again, more efficient encoding to combat noise generally requires larger-sized blocks. This is to be expected. The signal is separated from the noise on the basis of differences between the signal’s statistical properties and those of noise. The block size must be large enough to supply the decoder with enough data to draw statistically significant conclusions. See ELECTRICAL NOISE. Algebraic codes. Practical codes must use simple encoding and decoding equipment. Error-correcting codes for binary channels have been designed to use small digital logic circuits. These are called algebraic codes, linear codes, or group codes because they are constructed by algebraic techniques involving linear vector spaces or groups. For example, each of the binary codes I, II, and III discussed above (see table) contains four code words which may be regarded as vectors C = (c1, c2, . . . , cn) of binary digits ci. The sum C + C of two vectors may be defined to be the vector (c1 + cn, . . . , cn + cn) in which coordinates of C and C are added modulo 2. Codes I, II, and III each have the property that the vector sum of any two code words is also a code word. Because of that, these codes are linear vector spaces and groups under vector addition. Their code words also belong to the n-dimensional space consisting of all 2n vectors of n binary coordinates. Codes II and III, with n = 3 and 5, do not contain all 2n vectors; they are only two-dimensional linear subspaces of the larger space. Consequently, in Codes II and III, the coordinates ci must satisfy certain linear homogeneous equations. Code II satisfies c1 + c2 + c3 = 0. Code III satisfies c3 + c4 = 0, c2 + c3 + c5 = 0, c1 + c2 = 0, and other equations linearly dependent on these three. The sums in such equations are performed modulo 2; for this reason the equations are called parity check equations. In general, any r linearly independent parity check equations in c1, . . . , cn determine a linear subspace of dimension k = n − r. The 2k vectors in this subspace are the code words of a linear code. See GROUP THEORY; LINEAR ALGEBRA; NUMBER THEORY. The r parity checks may be transformed into a form which simplifies the encoding. This transformation consists of solving the original parity check equations for some r of the coordinates ci as expressions in which only the remaining n − r coordinates appear as independent variables. For example, the three parity check equations given for Code III are already in solved form with c1, c4, c5 expressed in terms of c2 and c3. The k = n − r independent variables are called message digits because the 2k values of these coordinates may be used to represent the letters of the message alphabet. The r dependent coordinates, called check digits, are then easily computed by circuits which perform additions modulo 2. See LINEAR SYSTEMS OF EQUATIONS. At the receiver the decoder can also do additions modulo 2 to test if the received digits still satisfy the parity check equations. The set of parity check equations that fail is called the syndrome because it contains the data that the decoder needs to diagnose the errors. The syndrome depends only on the error
173
174
Information theory locations, not on which code word was sent. In general, a code can be used to correct e errors if each pair of distinct code words differ in at least 2e + 1 of the n coordinates. For a linear code, that is equivalent to requiring the smallest number d of “ones” among the coordinates of any code word [excepting the zero word (0, 0, . . . , 0)] to be 2e + 1 or more. Under these conditions each pattern of 0, 1, . . . , e − 1, or e errors produces a distinct syndrome; then the decoder can compute the error locations from the syndrome. This computation may offer some difficulty. But at least it involves only r binary variables, representing the syndrome, instead of all n coordinates. Hamming codes. The r parity check equations may be written concisely as binary matrix equation (7). Here CT is a column vector, the transpose of HC T = 0
(7)
(c1, . . . , cn). H is the so-called parity check matrix, having n columns and r rows. A Hamming singleerror correcting code is obtained when the columns of H are all n = 2r − 1 distinct columns of r binary digits, excluding the column of all zeros. If a single error occurs, say in coordinate ci, then the decoder uses the syndrome to identify ci as the unique coordinate that appears in just those parity check equations that fail. See MATRIX THEORY. Shift register codes. A linear shift register sequence is a periodic infinite binary sequence . . . , c0, c1, c2, . . . satisfying a recurrence equation expressing cj as a modulo 2 sum of some of the b earlier digits cj−b, . . . , cj−1. A recurrence with two terms would be an equation cj = cj−a + cj−b, with a equal to some integer 1, 2, . . . , or b − 1. The digits of a shift register sequence can be computed, one at a time, by very simple equipment. It consists of a feedback loop, containing a shift register to store cj − b, . . . , cj−1 and a logic circuit performing modulo 2 additions. This equipment may be used to implement a linear code. First, message digits c1, . . . , cb are stored in the register and transmitted. Thereafter the equipment computes and transmits successively the n − b check digits cj obtained from the recurrence equation with j = b + 1, . . . , n. By choosing a suitable recurrence equation, the period of the shift register sequence can be made as large as 2b − 1. Then, with n equal to the period 2b − 1, the code consists of the zero code word (0, 0, . . . , 0) and 2b − 1 other code words which differ from each other only by cyclic permutations of their coordinates. These latter words all contain d = 2b − 1 “ones” and so the code can correct e = 2b−2 − 1 errors. See SWITCHING CIRCUIT. Intermediate codes. The Hamming codes and maximal period shift register codes are opposite extremes, correcting either one or many errors and having code words consisting either mostly of message digits or mostly of check digits. Many intermediate codes have been invented. One of them requires n + 1 to be a power of 2; say n + 1 = 2q. It then uses at most qe check digits to correct e errors.
Perfect codes. Although each pattern of 0, 1, . . . , e errors produces a distinct syndrome, there may be extra syndromes which occur only after more than e errors. In order to keep the number of check digits small, extra syndromes must be avoided. A code is called perfect if all 2r syndromes can result from patterns of 0, 1, . . . , e − 1, or e errors. Hamming codes are all perfect. M. J. E. Golay found another perfect binary code having n = 23, r = 11 check digits, and correcting e = 3 errors. Orthogonal parity codes. Orthogonal parity codes are codes with especially simple decoding circuits which take a kind of majority vote. Suppose the parity check equations can be used to derive 2e + 1 linear equations in which one digit, say c1, is expressed with each of the remaining digits c2, . . . , cn appearing in at most one equation. If at most e errors occur, then the received digits satisfy a majority of the 2e + 1 equations if and only if c1 was received correctly. For example, the recurrence cj = cj−2 + cj−3 generates a maximal period shift register code with n = 7, r = 4, e = 1. With j = 1, 3, and 4 in the recurrence equation, three of the parity check equations, c1 = c5 + c6, c1 = c3 + c7, and c1 = c2 + c4, are obtained, after using the fact that the shift register sequence has period 7. These three equations are already in the form required for decoding c1 by majority vote. Similar equations, obtained by permuting c1, . . . , c7 cyclically, apply for c2, . . . , c7. Then the decoder can be organized so that most of the equipment used to decode c1 can be used again in decoding c2, . . . , c7. See COMBINATORIAL THEORY. Wide-band signaling. A band-limited continuous channel has capacity C, given by Eq. (6), very near 1.44 WS/N if the signal-to-noise ratio S/N is small. Then high capacity is obtained even with low signal power S if the bandwidth W is wide enough. Wide bands are used in satellite systems and in military applications where a weak signal must hide in background noise or penetrate jamming. Or, many users can share the same wide radio channel effectively even though their signals spread over the entire band and interfere with one another. One method of wide-band signaling modulates a kind of noise carrier, a stream of short pulses of random sign, by changing signs in accordance with the digits of a binary message. The receiver has a synchronous detector that must regenerate the carrier locally in order to demodulate the signal. A carrier kept secret from other listeners can serve as the running cryptographic key of a stream cipher. Noiselike carriers may take the form of long periodic sequences (for example, Gold codes) generated by linear recurrences in the manner of shift-register codes. See AMPLITUDE-MODULATION DETECTOR; SPREAD SPECTRUM COMMUNICATION. Nonblock codes. Many useful codes do not send messages block by block. Simple convolutional codes intersperse check digits among message digits in a regular pattern. An example, designed to correct errors that cluster together in isolated bursts, transmits the binary message . . . , x1, x2, x3, . . . as . . . , c1, x1, c2, x2, c3, x3, . . . , with ci = xi−3 + xi−6 (modulo 2).
Information theory (biology) It can correct any burst of length 6 or less that lies more than 19 digits away from other errors. Convolutional codes are commonly decoded by sequential or dynamic programming methods such as the Viterbi algorithm. Using the given probabilistic description of the noise process, the decoder calculates the posterior probabilities of messages that might possibly have produced the data received up to the present time. The decoder then chooses the most likely message as its decoded message. The calculation is done digit by digit, using the probabilities computed for the first k received digits to simplify the calculation for k + 1 digits. Although the number of possible messages grows very quickly with k, one can eliminate highly unlikely messages to keep the calculation reasonably short at each step. See ESTIMATION THEORY. For signaling over radio channels with additive gaussian noise, trellis codes, which are particular kinds of nonblock codes, often have an advantage over block codes of equal complexity. Expressed in terms of signal-to-noise ratio, the advantage can be 3 dB or more even for simple codes. Edgar N. Gilbert Bibliography. E. R. Berlekamp, Algebraic Coding Theory, rev. ed., 1984; R. W. Hamming, Coding and Information Theory, 2d ed., 1986; R. W. Lucky, Silicon Dreams: Information, Man, and Machine, 1991; M. Schwartz, Information Transmission, Modulation, and Noise, 4th ed., 1990; N. J. A. Sloane and F. J. MacWilliams, Theory of ErrorCorrecting Codes, 9th reprint, 1998; C. E. Shannon, Collected Papers, 1993.
Information theory (biology) The application to biological systems of the theory of encoding effects on signal transmission and communications efficiency. In everyday language, information is associated with knowledge, meaning (semantics), and the influence of information on behavior (pragmatics). Information theory considers information to be quantitative and capable of being expressed in binary digits, or bits. The measure of information is based on its structure or its representation, taking into account the statistical properties of its structure. Mathematical information theory was created to describe the transmission of information in telecommunications, but it has been widely applied also in mathematics, physics, and biology. Representation of information. A physical quantity that carries information is called a signal. A signal may be represented in space and time. Examples include speech and radio waves, time-dependent voltage, a still or moving picture, and the pulse sequence within a nerve fiber. In order to calculate the amount of information it contains, a signal must be quantized in a two-step process for symbolic presentation. First, the time or space coordinate has to be discrete, appearing only at precise, usually equidistant points on the coordinates. For signals with limited variations (limited bandwidth), this is possible without a loss
of generality because of the sampling theorem. Second, the amplitude of the signal must be quantized, or expressed in (usually equal) steps. The size of the steps, which is arbitrary, determines the accuracy of the representation, but it is usually chosen at the limit of resolution to detect a certain variation of the amplitude. The signal can then be represented by a finite number of discrete points in time or space and by amplitude levels at each point. An information source is a device that produces signals that carry information. In the symbolic representation of information theory, these signals are discrete in time or space and quantized in amplitude, and the information source is usually described by a stochastic process. Written language is a typical example of the signal sequence produced by a stationary information source. Coding theorem of information. The coding theorem of information shows that it is possible to code the output of an information source with a number of bits equal to its entropy, or uncertainty with respect to the signals of the source. Symbols or symbol sequences with a high probability are represented by a short code word, and those with low probability are represented by a longer code word, much like the Morse alphabet. The coding theorem of information restricts the complexity needed to represent the information produced by an information source, and that restriction is important in telecommunications. Source coding makes use of the redundancy of the information source to reduce the amount of data to be transmitted. In the case of speech transmission with telephone quality, a reduction factor of about 1:10 has been achieved. For a television or picture phone signal, reduction factors of 1:100 or even 1:1000 are hypothesized. See ENTROPY. Information channel. The information channel is the connection of an information source with a receiver. The information source is connected by way of a coder to the channel. The coder transforms the signals of the source into transmissible signals, eliminates redundancy, and reduces the effect of noise introduced by the channel (channel coding). Suitable modulation may be included, and when performed simultaneously with channel coding, is called codulation. The transmission channel may have frequency dispersion (channel with memory). The channel typically introduces noise, which in the simplest case is additive to the signal and has a gaussian amplitude distribution, but the scheme also holds for more general cases, such as multiplicative noise and Poisson distributions. The decoder delivers the decoded signal to the receiver or destination. Entropy at the input and entropy at the output of the channel can then be calculated: if only noise without dispersion is present, perfect transmission without errors or loss of information is possible. Coding theorem of transinformation. The coding theorem of transinformation states that in the limiting case of very long sequences a transmission with vanishing error probability is possible. The ideal case may be approximated by coding very long signal
175
176
Information theory (biology) sequences, which also implies a very long delay between transmitter and receiver. However, errorfree transmission is possible even in noisy surroundings. Error-correcting codes can approach the ideal error-free case, and many optimized binary coding schemes exist. The decoder, which can be considered equivalent to a pattern recognition device, may have a very high complexity, but it can be reduced by algorithms. Its task is to classify or detect the desired signal in a set of possible received signals. The human receiver, or decoder, has the same task of pattern recognition, which is performed by the nervous system. See CYBERNETICS. Human information channel. Information theory can be applied to human information processing (Fig. 1). The number of receptors for each of various sensory organs has been determined, and the numerical values of maximum information flow for some of the sensory organs has been estimated. The data given in Fig. 1 show that television and telephone channels are well adapted to human receptors. See INFORMATION PROCESSING (PSYCHOLOGY). Information flow for various conscious human activities can also be estimated. Counting requires 3 bits/s; reading, 40 bits/s; hearing, 40 bits/s; and piano playing, 20 bits/s. The maximum human channel capacity for conscious output is 50 bits/s, which indicates an apparent reduction of information in the central nervous system. Such reduction is necessary because not all details of the external world need to penetrate the consciousness. A good deal of this information is superfluous or redundant and therefore can be eliminated; otherwise memory would be unable to store the entire flow of sensory information. If 50 bits/s is multiplied by a life expectancy of 70 years, the information received totals about 1010 bits. The storage capacity of human memory is estimated to be of that order of magnitude. See MEMORY.
Controlled source and bidirectional communication. In human communication, a unidirected information channel is a monologue. The usual situation, however, is a dialogue, which involves bidirectional communication. In bidirectional communication theory, the independent information source is replaced by a controlled or dependent information source. In a dialogue, the transmitter and the receiver are both described by controlled information sources. The behavior of the living being may be described statistically by a dependent information source, in which controlling input represents the sensory signals and the output is the behavior. The mathematical description of a controlled information source is given by the conditional probability p(x|xnyn) where x is the produced symbol, xn sequence of previously produced symbols, and yn the sequence of previously received symbols. The expression p(x|xnyn) also represents generator probability independent of the combination of the transmitted and received sequences xnyn. If behavior changes because of a learning or adaptive process, the probability changes. For the dependent information source, two entropy quantities may be defined. The first is the usual entropy, given by Eq. (1). This takes into account H(x) = lim E[− log p(x|xn )] n→∞
only the produced signal sequence xn. If both the produced and the received signal sequences xn and yn are considered, the free entropy is determined by Eq. (2). F(x) = lim E[− log p(x|xn yn )] n→∞
Sensory inputs
light sound pressure pain heat cold smell taste
number of receptors
number of nerve fibers
estimated channel capacity
2 × 108 3 × 104
2 × 106 2 × 104
5 × 107 bits/ s 4 × 104 bits/ s
5 × 105 3 × 106
104
104
106
105 107
2 × 103
107
2 × 103
1010 neurons
Conscious output
50 bits/s
Fig. 1. Human information processing. The estimates of channel capacity are based on the number of rods in the retina of the eye and the number of hair cells in the inner ear’s organ of Corti.
(2)
Clearly F(x) ≤ H(x) because it contains the knowledge of the received signals. The difference between the entropies, which is known as directed transinformation y → x, is given by Eq. (3), where T(x|y) repT (x|y) = H(x) − F(x)
Central nervous system
(1)
(3)
resents the influence of the controlling (or received) signals on the information production of the source, and is thus the transinformation from the y sequence to the x sequence. The whole entropy H(x) of the source thus consists of a free part F(x) and a dependent part T(x|y) transmitted by the y sequence. A bidirectional communication takes place when two dependent information sources are coupled. The entropy quantities for that case are shown in the table. The sum of the two directed transinformation quantities equals the transinformation calculated by the original information theory. If one of the two transinformations, T(x|y) or T(y|x), is zero, the problem is reduced to the special case of the unidirectional information channel. The two equations for the directed transinformation lead to the flow diagram of the communication shown in Fig. 2. This diagram also shows that only part of the produced entropy H(x), namely T(y|x), reaches the counterpart. Thus the coupling may be small because of noise or other limitations of the communication channels.
Infrared astronomy diagram of Fig. 3. The maximum coupling occurs when σ x + σ y = 1. Complete (quasi-deterministic) coupling can obviously occur in only one direction. With respect to a human communication, it is referred to as suggestion. Bidirectional communication theory has been applied to the group behavior of monkeys, but it may also be used to describe the theoretical aspects of user–machine interaction. See HUMAN-MACHINE SYSTEMS; INFORMATION THEORY. Hans Marko Bibliography. R. E. Blahut, Communications, Probability, Information Theory, Coding, 1987; C. Cherry, On Human Communication: A Review, a Survey and a Criticism, 3d ed., 1978; R. G. Gallager, Information Theory and Reliable Communication, 1968; D. L. Kincaid (ed.), Communication Theory: Eastern and Western Perspectives, 1987; S. Oyama, Ontogeny of Information: Developmental Systems and Evolution, 2d ed., 2000; J. R. Sampson (ed.), Biological Information Processing: Current Theory and Computer Simulation, 1989; C. Sybsema, Biophysics: An Introduction, 1989.
Quantities associated with entropy of bidirectional communication Quantity
Equation H(x) ⫽ lim E[⫺log p(x /xn )] n →∞ H( y) ⫽ lim E[⫺log p( y/yn )] n →∞ F( y) ⫽ lim E[⫺log p( y/xn yn )] F( y) ⫽ lim E[⫺log p( y/xn yn )] n →∞
Entropy of x Entropy of y Free entropy of x Free entropy of y Directed transinformation y→x Directed transinformation x→y Total transinformation (coincidence, synentropy) x →y
T(x/ y) ⫽ H(x) ⫺ F(x) T( y/ x) ⫽ H(y) ⫺ F( y) T ⫽ T(x /y) ⫹ T( y/x)
→
output entropy
transinformation free entropy
free entropy
Infrared astronomy transinformation
input entropy
Fig. 2. Flow diagram of communication.
σx
x
suggestion 1
monologue y
maximum coupling
dialogue σy
decoupling 0
monologue x
y
1 suggestion
Fig. 3. Coupling diagram of communication.
Maximal coupling, however, is limited. If the coupling coefficients Gx and Gy are defined by Eqs. (4) and (5), then the limiting condition σ x + σ n ≤ 1 holds. Gx =
T (x|y) H(x)
(4)
Gy =
T (y|x) H(y)
(5)
According to that inequality, all possible coupling states of a communication are shown in the coupling
The field of astronomical observations specializing in detecting photons from the infrared portion of the electromagnetic spectrum. Detection technology has traditionally defined the domains of the electromagnetic spectrum. The infrared portion of the spectrum spans the range from the red limit of human vision (approximately 0.7 micrometer) to the shortest wavelengths accessible to heterodyne radio receivers (several hundred micrometers). See ELECTROMAGNETIC RADIATION; INFRARED RADIATION. Differences from optical astronomy. Astronomers observing the universe with infrared light encounter a number of fundamental differences relative to those observing with visible light: 1. Infrared observations are more sensitive to cooler objects than visible-wavelength observations. Blackbodies at a temperature cooler than 2000 kelvins (3100◦F) radiate virtually all of their light in the infrared part of the spectrum (Fig. 1). Infrared observations are particularly well suited to detect both forming stars and evolved stars (that is, stars in the final stages of their lives) since both classes of objects are cool. Since starlight often heats nearby dust grains to temperatures of tens or hundreds of degrees, reprocessing the visible starlight into exclusively infrared radiation, warm dust is also a common target for infrared observations. See HEAT RADIATION. 2. Interstellar dust is substantially more transparent at infrared wavelengths than at visible wavelengths. The dust grains constituting the interstellar medium tend to be smaller than 1 µm in diameter and are mainly composed of silicate and carbon grains. The theory of the scattering of electromagnetic radiation by dielectric particles dictates that such particles efficiently scatter wavelengths of light which
177
178
Infrared astronomy ultraviolet
visible
infrared
107 8000 K energy flux, W . m−2 . steradian−1
6000 K 106 4000 K
105 2000 K 104
103 0.2
1
0.3 0.4 0.5
2 3 wavelength, µm
4 5
10
20
Fig. 1. Plots of the energy flux emitted by a blackbody as a function of wavelength for several temperatures. Hotter objects are substantially brighter and emit most of their energy at shorter wavelengths. Objects cooler than a few thousand kelvins emit predominantly infrared radiation. (The vertical axis is the product of the energy flux per unit wavelength and the wavelength itself.)
are equal to or smaller than the particle’s size. Since infrared light has a wavelength substantially larger than the size of interstellar grains, the grains are ineffective at extinguishing the infrared light of stars and galaxies which lie behind the dust. Infrared observations thus enable astronomers to view distant objects through the obscuring dust that permeates the Milky Way Galaxy. Forming and evolved stars often reside in dense clouds of interstellar dust grains and can be observed only at infrared and even longer radio wavelengths. See INTERSTELLAR EXTINCTION; INTERSTELLAR MATTER; SCATTERING OF ELECTROMAGNETIC RADIATION. 3. The quantized energies of molecular rotational and vibrational transitions, which give rise to molecular spectral lines, fall largely in the infrared part of the spectrum, as do many hyperfine lines of individual atoms. Infrared spectral lines of water, molecular
transmission
optical window
infrared windows
1.0 0.8
atmosphere is opaque
0.6 0.4 0.2 0.0
0.1
1
10 wavelength, µm
100
1000
Fig. 2. Atmospheric transmission as a function of wavelength throughout the visible and infrared parts of the spectrum, showing atmospheric windows.
hydrogen, and carbon monoxide probe dense interstellar environments and cool stellar atmospheres. Very small interstellar grains can be considered as immense molecules. The bending and stretching modes of (CH) and (SiO) on the surface of these grains produce broad infrared spectral features diagnostic of the grains and, in cold environments such as dark interstellar clouds, of their icy mantles. See INFRARED SPECTROSCOPY. In addition to these astrophysical differences, the technology and practice of infrared astronomy differs from visible-wavelength astronomy in fundamental ways: 1. The Earth’s atmosphere is opaque to infrared radiation through a substantial fraction of the spectrum (Fig. 2). This opacity arises largely from water in the Earth’s lower atmosphere. Water’s infrared spectral absorption lines blanket the infrared spectrum, particularly when blurred together by the pressure broadening which occurs in the dense terrestrial atmosphere. At the shorter infrared wavelengths, the infrared spectrum has several windows of transmission which permit ground-based telescopes to observe celestial sources. At the longer infrared wavelengths, the atmosphere is nearly opaque, and astronomers must resort to high-flying aircraft or orbiting satellites in order to evade the atmospheric opacity. See SATELLITE (ASTRONOMY). 2. The energy of chemical bonds and the work function for the liberation of electrons from a metal via the photoelectric effect is of order a few electronvolts. Planck’s law, E = hν = hc/λ, which relates the energy E of a photon to its frequency ν or its wavelength λ via Planck’s constant (h = 6.63 × 10−34 joule-second) and the speed of light (c = 3 × 108 m/s), dictates that only photons with wavelength shorter than about 1 µm can induce chemical reactions or liberate free electrons. Thus physics limits photography and photon-counting photomultiplier tubes to operation mainly in the visible-wavelength domain. Infrared detection technology relies largely on either mimicking the photoelectric effect inside a crystalline semiconductor material or monitoring the temperature change of a semiconductor under the influence of infrared radiation. See PHOTOEMISSION. 3. Objects of room temperature (300 K or 80◦F) emit radiation throughout most of the infrared spectrum. This glow interferes with the detection of faint astronomical sources, limiting the sensitivity of observations. Rapid comparison and differencing of the signal from the source direction versus the adjacent sky can mitigate some of these effects. Cooling of the entire telescope—impractical on the ground but possible in the vacuum of space—can substantially reduce the thermal glow of the telescope optics and result in unprecedented sensitivity to faint astronomical sources. Infrared technology. Detectors of infrared radiation divide into two classes: bolometers and photovoltaic or photoconductive devices. Bolometer detectors have temperature-sensitive electrical conductivity. Incident radiation warms the detector, and
Infrared astronomy the resulting subtle change in the electrical resistance of the detector is measured. Infrared photodetectors are crystalline semiconductors in which some electrons in the crystal lattice lie only a short distance in energy away from becoming unbound and behaving like metallic conducting electrons. Infrared light with energy in excess of the binding energy creates free charge carriers, either changing the bulk conductivity of the device (photoconductors) or charging or discharging a semiconductor junction (photovoltaics). The detector’s response to infrared radiation depends on the chemistry-dependent solidstate physics of the detector material. In a mixture of indium and antimony (InSb), for example, the binding energy of the electrons corresponds to an infrared wavelength of 5.5 µm. This material is responsive to photons of this wavelength or shorter. Different semiconductor materials have different long-wavelength cutoffs. By doping the semiconductor material with impurities, it is possible to produce even more weakly bound electrons, yielding detectors with sensitivity to wavelengths as long as 100 µm or more. See BOLOMETER; PHOTOCONDUCTIVITY; PHOTOVOLTAIC EFFECT. Since the mid-1980s, large-scale integration of semiconductor components has permitted the production of arrays of infrared detectors. These arrays now exist in formats as large as 2048 × 2048 elements (4 million detectors on a single device). Each detector on such an array is also substantially more sensitive than its 1980s counterpart. Infrared astronomy has blossomed thanks to the multiple advantages of observing with so many detectors simultaneously. Systems of mirrors, gratings, prisms, and lenses direct light to infrared detectors in applications such as photometers, cameras, and spectrographs. Ordinary glass lenses and prisms are insufficiently transmissive at wavelengths longer than 2 µm to be useful in an infrared optical system. Instead, infrared transmissive materials such as calcium fluoride, zinc selenide, silicon, and germanium can be fabricated into lenses and prisms. Calcium fluoride, for example, is transparent to infrared light of wavelengths as long as 10 µm. Mirrors and diffraction gratings in infrared optical systems are often gold-coated since gold is a particularly efficient reflector across the entire infrared region of the spectrum. Many instruments depend on the use of wavelength-band-limiting filters for their operation. These filters are produced by coating an infrared transmissive substrate with multiple alternating layers of material of differing refractive index. With proper tuning of the thickness and number of layers, the constructive and destructive interference of the electromagnetic waves reflected from the layer boundaries can efficiently transmit a well-defined range of wavelengths. See INTERFERENCE FILTERS; OPTICAL MATERIALS; REFLECTION OF ELECTROMAGNETIC RADIATION. Astronomical targets. The targets of infrared observations include ordinary stars and galaxies, planets, brown dwarfs, young stellar objects, evolved stars, starburst galaxies, and redshifted radiation.
Ordinary stars and galaxies. Although popular interest in infrared astronomy focuses on exotic objects observable only using infrared light, infrared observations continue to play a fundamental role in understanding the more pedestrian stars and galaxies that constitute most of the visible-wavelength universe. Just as at visible wavelengths, where the colors and luminosities of stars divide them into distinct spectral classes via the Hertzsprung-Russell diagram, stars can be classified as to their temperature and luminosity class via their infrared “colors.” In this case, color refers to the ratio of the brightness of an object compared at two different infrared wavelengths. Sources with proportionally more flux at longer infrared wavelengths are referred to as “red” by infrared astronomers in analogy to objects which literally appear red using visible light. Color-based distinctions can also be made among classes of galaxies. At infrared wavelengths, interstellar dust within the Milky Way Galaxy becomes only a minor hindrance to seeing stars across or galaxies beyond the galactic plane, permitting a more complete census of these objects. At infrared wavelengths longer than 10 µm, infrared light emitted from dust warmed by stars in normal galaxies begins to dominate the infrared light from stars, making dusty spiral galaxies prime targets for mid-infrared observations. See GALAXY, EXTERNAL; HERTZSPRUNG-RUSSELL DIAGRAM; MILKY WAY GALAXY; STAR. Planets. At the shortest infrared wavelengths, planets and asteroids are observed via reflected sunlight. Their infrared spectra are diagnostic of their surface mineralogy and atmospheric chemistry. The outer gas giants have near-infrared reflected spectra which are dominated by the absorption bands of molecular hydrogen and methane in their atmospheres. At infrared wavelengths longer than 3 µm, thermal radiation from the planets begins to dominate over reflected sunlight. Jupiter, still cooling from its initial formation, emits twice as much energy as it receives from the Sun—nearly all of it at infrared wavelengths. See ASTEROID; JUPITER; PLANET; PLANETARY PHYSICS. Brown dwarfs. An object with a mass less than 8% that of the Sun (equivalent to 80 times the mass of Jupiter) is incapable of burning hydrogen in its core to sustain its luminosity—the hallmark of being a star. At the time of their formation, the interiors of such substellar objects are warmed by their gravitational contraction from an initially diffuse interstellar cloud of gas. Immediately following their formation, their surface temperature mimics those of the coolest stars (2000 K or 3100◦F). Even at this most luminous point in their evolution, most of the light emerges in the infrared part of the spectrum. Since they have no internal source of energy, the radiation emerging from their surface causes them to cool over time. An object with 3% of the mass of the Sun cools to a surface temperature of 300 K (80◦F) after only about 109 years. Infrared surveys of the entire sky are beginning to reveal large numbers of these brown dwarfs, and they appear to be more common than ordinary stars in the Milky Way Galaxy.
179
180
Infrared astronomy Young stellar objects. Young stellar objects (YSOs) are newly formed or forming stars. These stars are being assembled by gravity out of dense (1000 atoms per cubic centimeter) interstellar clouds. The environment surrounding the forming star is naturally very dusty, and the slight rotation of the natal cloud combined with gravity drives the infalling material to form a thin flattened disk around the star. This disk may be heated either by the friction of the infalling material or by radiation from the central star. In either case, the disk temperatures range from 2000 K (3100◦F) close to the star to 20 K (−424◦F) in the outer region of the disk. Virtually all of the disk emission emerges at infrared wavelengths. Planets accrete within these disks, and infrared observations provide the primary astrophysical insight into the process of planetary formation. See PROTOSTAR. Evolved stars. After exhausting the initial supply of hydrogen in the stellar core, stars reconfigure themselves to liberate the energy of hydrogen shell burning around the core and helium burning within the core. Stellar physics dictates that the star must grow to large size and become cool at its surface in order to dissipate the energy being produced within. At this stage, some fraction of stars become unstable to pulsation. Their outer envelopes are weakly bound and easily lost. Dust grains condensing in the expanding envelope can completely enshroud the star. Under these circumstances, the observed radiation emerges entirely in the infrared part of the spectrum as the dense surrounding dust absorbs the short-wavelength starlight and, warmed only to a temperature of a few hundred kelvins, emits exclusively infrared radiation. Stars slightly more massive than the Sun, which are common throughout the Milky Way Galaxy, evolve through this stage. Since these stars are intrinsically luminous and numerous, and because the emergent infrared luminosity penetrates the dust which permeates the Milky Way Galaxy, these stars are ideal tracers of the structure of the Galaxy. See GIANT STAR; STELLAR EVOLUTION. Starburst galaxies. Although normal galaxies emit most of their light at the boundary between the visible and infrared portions of the spectrum, galaxies undergoing active bursts of star formation can produce most of their radiation at wavelengths of 10 µm or longer. The radiation emerges in this part of the spectrum because the star-forming regions are embedded in dust clouds which absorb the starlight and, having been warmed to temperatures of tens of kelvins, reradiate energy largely in the infrared portion of the spectrum. See STARBURST GALAXY. Ultraluminous infrared galaxies. The gravitational interaction between two gas-rich galaxies can induce both objects to undergo an extensive burst of star formation. The resulting energy release can augment the flux of the galaxy by a factor of 10 or more with most of the radiation arising in the infrared part of the spectrum. Such “ultraluminous” infrared galaxies are among the most luminous galaxies in the universe. Redshifted radiation. The apparent Doppler shift due to the expansion of the universe causes the ultravi-
olet and visible light originally emitted by extremely distant stars and galaxies to be shifted into the infrared part of the spectrum. Redshifts this large originate from objects at distances of 1010 light-years or more. Since the light collected from these objects was emitted by them 1010 years ago, these observations probe the state of the universe at the earliest times. Many observational investigations of the early universe focus on collecting and analyzing infrared light from the most distant objects known. Bolometer arrays capable of imaging at a wavelength of 300 µm are now able to detect the first galaxies which formed following the big bang. See BIG BANG THEORY; COSMOLOGY; REDSHIFT. Ground-based infrared astronomy. Telescopes dedicated to visible-wavelength astronomy are also effective collectors of infrared radiation. Equipped with infrared focal-plane-array imagers and spectrographs, these telescopes can observe the infrared universe through the accessible atmospheric windows (Fig. 1). Such observations are particularly effective shortward of 2 µm, where thermal emission from the telescope is negligible relative to other backgrounds. Nearly every telescope larger than 2 m (80 in.) in aperture has one or more dedicated infrared instruments that share time with visible-wavelength images and spectrographs. Since moonlight interferes strongly with visible observations but hardly affects infrared observations, instruments are scheduled in synchrony with the phases of the Moon. Near full moon, most large telescopes conduct infrared observations. Ground-based infrared observations are less susceptible to blurring due to atmospheric turbulence than visible-wavelength observations. Under ideal circumstances, only diffraction of light through the telescope aperture should blur stellar images. At visible wavelengths, atmospheric turbulence broadens images by a factor of 50 or more compared with the diffraction limit. Modern computers have enabled the incorporation of flexible mirrors into telescope optical paths which compensate for the blurring effects of the atmosphere. At visible wavelengths, the atmosphere changes too rapidly to permit optimal removal of the effects of atmospheric turbulence. The changes occur more slowly at infrared wavelengths, and such adaptive-optics systems can compensate for the atmosphere’s effects and yield images with diameters limited by diffraction alone (Fig. 3). In addition to providing images of finer detail, the sharper images concentrate more light into a smaller region, making it possible to detect fainter objects. See ADAPTIVE OPTICS. Infrared space missions. The opacity of the Earth’s atmosphere at most infrared wavelengths and the need to place a telescope in an environment where the entire telescope structure can be cooled to cryogenic temperatures have motivated a number of extremely successful satellite missions largely devoted to infrared astronomy. IRAS. The Infrared Astronomy Satellite (IRAS) conducted the first survey of nearly the entire sky at wavelengths of 12, 25, 60, and 100 µm. The
Infrared astronomy
Fig. 3. Example of the improvement in both sharpness and intensity of a stellar image with the application of adaptive optics to compensate for atmospheric turbulence. Plots of image intensity with adaptive optics (a) turned off and (b) on show improvement both in resolution from 0.6 to 0.068 arcsecond (as measured by the full width of the peak at half its maximum amplitude) and in peak intensity, enabling the detecction of fainter objects. Inset shows the corresponding images. Images were obtained at the Canada-France-Hawaii Telescope (aperture of 3.3 m or 130 in.) at a wavelength of 0.936 µm with an exposure time of 30 s. (University of Hawaii)
three longest wavelengths are accessible almost exclusively from space. Launched by the National Aeronautics and Space Administration in 1983, IRAS contained a 0.6-m (24-in.) telescope cooled to liquid helium temperature and a focal plane of dozens of individual infrared detectors which scanned across the sky. The resulting catalogs of 350,000 stars and galaxies, along with accompanying images of most of the sky, remain a fundamental scientific resource. As might be expected when opening an entirely new wavelength regime to sensitive observation, the IRAS mission revealed a number of fundamentally new phenomena, including residual dust disks around main-sequence stars (presumably the debris of evolving solar systems), ultraluminous infrared galaxies, and tenuous filaments of interstellar dust now known as infrared cirrus. ISO. The IRAS mission placed the infrared sky in context with an all-sky survey. In 1995, the European Space Agency launched the Infrared Space Observatory (ISO), the first infrared observatory with the capability of making detailed photometric and spectroscopic studies of individual objects at wavelengths ranging from 2.5 to 240 µm. The ISO’s 0.6-m (24-in.) liquid-helium-cooled telescope operated for 21/2 years, making more than 20,000 independent observations of astronomical sources. The telescope’s focal plane included an infrared camera (ISOCAM), with sensitivity from 2.5 to 17 µm, which imaged the sky onto two different 32 × 32-element infrared focal-plane arrays; an infrared photometer and polarimeter (ISOPHOT), which used individual detectors to measure precisely the quantity of infrared radiation from celestial sources with detectors sensitive between 2.5 and 240 µm; and a pair
of spectrometers observing at wavelengths between 2.4 and 200 µm. ISO observations have contributed to nearly every area of astrophysics, ranging from spectroscopic studies of planetary atmospheres to infrared imaging of forming galaxies in the early universe. SIRTF. Scheduled for launch into an Earth-trailing orbit in 2001, the Space Infrared Telescope Facility (SIRTF) will place a sophisticated battery of infrared cameras and spectrographs behind a 0.85-m (33.5-in.) liquid-helium-cooled telescope with a 5-year cryogen lifetime. SIRTF’s infrared array camera (IRAC) will contain four 256 × 256-element infrared arrays spanning the wavelength range from 3.6 to 8.0 µm. These arrays are substantially larger and more sensitive than those available on ISO and illustrate the rate of advance of infrared imaging technology. SIRTF will have an imaging capability to wavelengths as long as 160 µm and spectroscopic coverage between 4 and 40 µm. SIRTF joins the Compton Gamma-Ray Observatory, the Chandra X-ray Observatory, and the Hubble Space Telescope (observing at visible and near-infrared wavelengths) as one of NASA’s “great observatories,” which operate across the electromagnetic spectrum. SIRTF will illuminate nearly every area of astrophysics and is particularly well suited to revealing the abundance of cool brown dwarfs and ultraluminous infrared galaxies. Prospects. More ambitious plans for space telescopes emphasize infrared observational capability because of the range of astrophysics addressed in this wavelength regime as well as relaxed design constraints on telescopes, since infrared wavelengths are several times larger than visible-light wavelengths. NASA’s long-term goal involves detecting and imaging planets around other stars. Infrared imaging and interferometry play a central role in the realization of this objective. Since the planets of interest naturally reside next to an extremely bright star, an interferometer—which combines the light from two or more well-separated telescopes—is required to isolate the planet’s feeble light from that of the star. Interferometry has been pioneered largely at infrared wavelengths from the ground, since the Earth’s atmosphere appears more stable when observing infrared light than when observing visible light. In addition, the contrast between the star and adjacent planet is substantially improved at infrared wavelengths, making the planet easier to discern in the star’s glare. Infrared spectroscopic observations are diagnostic of water and ozone in planetary atmospheres and will be fundamental in assessing the probability that life could arise or even has developed on extrasolar planets. Michael F. Skrutskie Bibliography. G. Basri, The discovery of brown dwarfs, Sci. Amer., 282(4):76–83, April 2000; H. Freudenreich, Deconstructing the Milky Way, Amer. Sci., 87(5):418–427, September–October 1999; C. Kraan-Korteweg and O. Lahav, Galaxies behind the Milky Way, Sci. Amer., 279(4):50–57, October 1998; M. Skrutskie, Charting the infrared sky, Sky Telesc., 94(2):46–47, August 1997.
181
182
Infrared imaging devices
Infrared imaging devices Devices that convert an invisible infrared image into a visible image. Infrared radiation is usually considered to span the wavelengths from about 0.8 or 0.9 micrometer to several hundred micrometers; however, most infrared imaging devices are designed to operate within broad wavelength regions of atmospheric transparency, that is, the atmospheric windows. At sea level, for horizontal paths of a few kilometers’ length, these are approximately at 8– 14 µm, 3–5 µm, 2–2.5 µm, 1.5–1.9 µm, and wavelengths shorter than 1.4 µm. The radiation available for imaging may be emitted from objects in the scene of interest (usually at the longer wavelengths called thermal radiation) or reflected. Reflected radiation may be dominated by sunlight or may be from controlled sources such as lasers used specifically as illuminators for the imaging device. The latter systems are called active, while those relying largely on emitted radiation are called passive. Active optical imaging systems were developed to achieve a nighttime aerial photographic capability, and work during World War II pushed such systems into the near-infrared spectral region. Development of passive infrared imaging systems came after the war, but only the advent of lasers allowed creation of active infrared imagers at wavelengths much longer than those of the photographic region. Striking advances have been made in active infrared systems which utilize the coherence available from lasers, and hybrid active-passive systems have been studied intensively. See INFRARED RADIATION; LASER. Although developed largely for military purposes, infrared imaging devices have been valuable in industrial, commercial, and scientific applications. These range from nondestructive testing and quality control to earth resources surveys, pollution monitoring, and energy conservation. Infrared images from aerial platforms are used to accomplish “heat surveys,” locating points of excessive heat loss. An example is shown in Fig. 1a. As discussed below, calibration allows association of photographic tones in this figure with values of apparent (that is, equivalent blackbody) temperatures. Dark areas in the figure are “colder” than light ones. See NONDESTRUCTIVE EVALUATION; REMOTE SENSING. Scanning systems. Infrared imaging devices may be realized by electrooptical or optomechanical scanning systems. All have an optical head for receiving the infrared radiation and a display for the visible image. These are connected by electronics for the passage of electrical signals from the detector element(s) to the display input. Signal processing may be incorporated in the electronics to selectively enhance or reduce features in the produced visible image. For example, in Fig. 1b a “levelslice” technique presents in white all areas (mainly rooftops) with apparent temperatures between −7.9 and −8.9◦C or 17.4 and 15.6◦F. (The ambient air temperature was −5◦C or 23◦F.) Black regions in the fig-
(a)
(b) Fig. 1. Thermal imagery in the wavelength range 10.4– 12.5 µm obtained during flights over Ypsilanti, Michigan, at 2400 hours, November 23, 1975, by the Airborne Multispectral Scanner operated by the Environmental Research Institute of Michigan. (a) Calibrated thermal imagery. (b) Signal-processed thermal imagery of same scene. (From F. Tannis, R. Sampson, and T. Wagner, Thermal imagery for evaluation of construction and insulation conditions of small buildings, Environmental Research Institute of Michigan, ERIM Rep. 116600-12-F, July 1976)
ure correspond to apparent temperatures below or above the narrow “sliced” temperature range of the white regions. Optomechanical methods such as rotating prisms or oscillating mirrors may be used to sample or scan the spatial distribution of infrared radiation in either the object or image plane. Electrooptical imaging devices may use an electron beam (for example, vidicons) or charge transport in solids (for example, charge-coupled devices, or CCDs) to scan the infrared image formed by the optics of the device. This image-plane scanning places more stringent requirements upon the optics for image quality offaxis than does use of mechanically moved optical elements placed before the entrance aperture of the system. Intensive development of pyroelectric vidicons, detector arrays, and infrared charge-coupled devices (IRCCDs) has taken place, reflecting the critical role played by the detector element in all infrared systems. The spectral, spatial, and temporal responses of detectors are the major factors in determining the wavelength regions, the spatial resolution, and the frequency response (that is, the time constant) of imaging devices. See CHARGE-COUPLED DEVICES. Detector arrays. Optomechanical scanning methods often stress the response time of the detectorelectronics package. As a result, multiple detectors or detector arrays are sometimes incorporated in the focal planes, resulting in partially electronically scanned optomechanical systems. The technology for use of a linear array of detector elements (often lead selenide and indium antimonide detectors for the 3–5-µm region, and mercury-doped germanium and mercury cadmium telluride for the 8–14-µm window) is well developed, and the use of a two-dimensional array or matrix of detectors has
Infrared imaging devices been studied. Optomechanical imagers incorporating such arrays allow the use of time delay and integration (TDI) of the signals to improve the resulting signal-to-noise ratios. Solid-state components such as charge-coupled devices afford the opportunity for implementation of signal processing directly at the focal plane. Two approaches have been undertaken to attain the focalplane array technology of infrared charge-coupled devices. In one, the development of a hybrid device, an infrared detector matrix of any suitable photodetector material, for example, indium antimonide, mercury cadmium telluride, and lead tin telluride, is mated with a conventional silicon charge-coupled device. Thus two solid-state wafers or “chips” are integrated to obtain an infrared charge-coupled device. In the other, the goal is a monolithic chip, one incorporating the photodetection, charge generation, and charge transfer in a structure made of essentially one material. Candidate materials include impuritydoped silicon, indium antimonide, and mercury cadmium telluride. The hybrid device technology can be implemented more readily than that needed for monolithic infrared charge-coupled devices. The development of infrared charge-coupled devices with the number of detecting elements in a sufficiently closely packed array required for high-performance infrared imaging devices involves additional difficulties. Scanning motion. Some optomechanical imagers produce a two-dimensional scan entirely by movement of components of the device itself; others utilize the motion of a platform such as an aircraft or satellite. The first kind of system includes the forwardlooking infrared (FLIR) or framing imagers which usually scan in televisionlike raster patterns and display, synchronously if done in real time, a visible image corresponding to the spatial distribution of infrared radiation. These visible image outputs have been named thermographs and thermograms. Commercially available imaging devices of this type have used horizontally and vertically rotating germanium prisms, mirrors oscillated in orthogonal directions, two mirrors and a six-sided rotating prism, and other schemes to produce images at relatively slow rates from 16 per second to less than a quarter of a frame per second. Higher-performance systems have been produced for military purposes. The second class of imaging systems includes those often called line scanners or thermal mappers. One such system, the 12-channel Airborne Multispectral Scanner operated by the Environmental Research Institute of Michigan (ERIM; Fig. 2), includes two thermal radiation channels, at 8.2–9.3 µm and 10.4–12.5 µm whose magnetic-tape recorder output was processed to produce the thermal imagery in Fig. 1. Multispectral imaging systems can be used to collect large amounts of data that may be processed to create special-purpose images. Advanced image data processing techniques are being used in remote sensing applications such as mineral detection and
183
assessment, vegetation cover classification, agricultural crop monitoring and management, and urban analyses and planning. If the system has more than about 30 spectral channels with relatively fine spectral resolution, imaging spectroscopy can be carried out. These hyperspectral imaging systems, compared with multispectral ones, have data outputs better described by more complex representations than simple images. Hypercubes, many-dimensional vectors that need multidimensional data analysis methodologies, have become especially useful. Industrial and laboratory applications now include spatially resolved infrared microspectroscopy. Such systems are becoming available as a direct result of the creation of modern focal-plane arrays (FPAs) for use in, for example, fast Fourier transform infrared (FT-IR) imaging systems. A simple illustration of a system for remote sensing applications from airborne or space-based platforms would look similar
electronics
tape recorder
telescope rotating scan mirror
detector aircraft skin
radiation from ground
(a)
horizon
terrain
scan line ground resolution element
direction of scan
direction of flight total field of view (b) Fig. 2. Airborne Multispectral Scanner. (a) Schematic diagram of equipment. (b) Scanning operation, utilizing motion of aircraft. (After F. Tannis, R. Sampson, and T. Wagner, Thermal imagery for evaluation of construction and insulation conditions of small buildings, Environmental Research Institute of Michigan, ERIM Rep. 116600-12-F, July 1976)
184
Infrared lamp to Fig. 2a, with the detector replaced by an entrance aperture for a multispectral or hyperspectral dispersing element, such as a prism or grating or interferometric subsystem. See FOURIER SERIES AND TRANSFORMS. Characterization of output. The instantaneous field of view (IFOV) or resolution element of imaging systems is always geometrically filled by the radiating source, so that the output of the device is a response to changes in amount of radiation from field of view to field of view. These changes are best characterized in terms of radiance L, the radiant flux per unit area per unit solid angle, usually in a selected spectral band of wavelengths, λ. Even in the infrared regions, the radiance variation may be ascribed to changes in reflectance, incident radiation, emissivity, or temperature. By restriction to wavelengths dominated by emission, the so-called thermal wavelengths longer than 3.5 µm, the radiance change can be described by the equation below, where T is the L =
∂L ∂L T + ∂T ∂
absolute temperature and is the emissivity. Contributions due to are usually treated as changes in an equivalent blackbody temperature by setting = 1 and = 0. Then T represents an equivalent blackbody temperature, and the radiance variation can be ascribed entirely to a value of T. That value of T corresponding to a radiance difference which will just produce a signal-to-noise ratio of 1 is called the noise equivalent temperature difference (NETD). One can also characterize the performance of an imaging system by a noise equivalent emissivity difference or even a noise equivalent reflectivity difference. The use of noise equivalent temperature difference as a figure of merit for thermal imagers is obviously more appropriate. For the higher-performance forward-looking infrared systems, a useful figure of merit is the minimum resolvable temperature difference (MRTD), a figure of merit which includes the characteristics of the display and of the observer as well. See EMISSIVITY; HEAT RADIATION. Display. If a visible image is the desired final output of an infrared imaging device, it may be displayed as a conventional television picture by means of a cathode-ray tube (CRT). Cathode-ray-tube technology has been developed to a level that is suitable, and research has been undertaken toward creation of satisfactory flat panel displays using liquid crystal elements, light-emitting diodes, or plasma panels. Systems not requiring a real-time image display may utilize analog or digital data storage or transmission systems, which then are used to produce permanent visual records such as photographs. High-resolution “hard copy” images can be produced by sophisticated systems using electron-beam or laser recording on film. Complex signal-processing techniques are easily introduced before the final image recording is made. See CATHODE-RAY TUBE; LIGHT-EMITTING DIODE; LIQUID CRYSTALS. George J. Zissis
Bibliography. J. S. Accetta and D. L. Shumaker (eds.), The Infrared and Electro-Optical Systems Handbook, vols. 1–8, ERIM and SPIE, 1993; M. Bass with E. Stryland, D. R. Williams, and W. L. Wolfe (eds.), Handbook of Optics, 2d ed., vols. 1 and 2, Optical Society of America/McGraw-Hill, 1995; G. C. Holst, CCD Arrays, Cameras, and Displays, 2d ed., SPIE and JCD, 1998; G. C. Holst, Testing and Evaluation of Infrared Imaging Systems, 2d ed., SPIE and JCD, 1998; W. L. Wolfe, SPIE Tutorial Texts: Introduction to Infrared System Design, 1996, Introduction to Imaging Spectrometers, 1997, Introduction to Radiometry, 1998, and with G. J. Zissis (eds.), The Infrared Handbook, 1978, SPIE/ERIM-IRIA.
Infrared lamp A special type of incandescent lamp that is designed to produce energy in the infrared portion of the electromagnetic spectrum. The lamps produce radiant thermal energy which can be used to heat objects that intercept the radiation. All incandescent lamps have radiation in three regions of the electromagnetic spectrum, the infrared, the visible, and the ultraviolet. An infrared lamp with a filament operating at 2500 K will release about 85% of its energy in the form of thermal radiant energy, about 15% as visible light, and a tiny fraction of a percent as ultraviolet energy. The amount of infrared radiation produced by a lamp is a function of the input wattage of the lamp and the temperature of its tungsten filament. For most infrared lamps, the thermal radiation will be about 65–70% of the input wattage. A typical 250-W infrared lamp radiates about 175 W of thermal energy. Temperature and energy spectrum. The temperature of the incandescent filament determines the portion of the energy output allocated to each spectral region. As the filament temperature increases, the wavelength at which peak energy output occurs shifts toward the visible region of the spectrum (Fig. 1). Lamp filament temperatures are always stated in kelvins (K). See HEAT RADIATION; TEMPERATURE. Types. The lamps are supplied in two shapes (Fig. 2). The most common shape, for general use, is the R lamp (Fig. 2a), since the reflector unit is built in and the lamp needs only a suitable socket to form an infrared heating system. These lamps are available in 125-, 250-, and 375-W ratings, all in glass bulbs with a nominal reflector diameter of 12.5 cm (5.0 in.) and voltage rating of 115–125 V. The other type of infrared lamp, the tubular quartz bulb lamp (Fig. 2b), is used with a separate external reflector designed to distribute the heat as desired. More than 30 sizes of the tubular quartz bulb lamps are available in 375–5000-W ratings. Lighted lengths range from 12.5 cm (5 in.) to 127 cm (50 in.), overall lengths (including connecting terminals) from 22.38 cm (8.8 in.) to 138.68 cm (53.8 in.), and tube diameter is 0.95 cm (0.375 in.). Voltage ratings are a
Infrared lamp visible
infrared
K 00 20 K at 0 he 50 d l2 re K ria 0 ra t 6 inf us 29 60 K ind tts 3 d re wa od 3 0 ra o 50 tofl o ph
relative output
inf
function of length, with nominal values ranging from 120 V for short tubes to 960 V for the longest tubes. Because the lamp bases become quite hot, R lamps (Fig. 2a) should be used only in porcelain sockets, rather than in the common brass shell socket which would fail with prolonged use, posing a fire hazard. The R lamp is available with the glass face of the bulb stained red or ruby, reducing the amount of light emitted by the lamp. This reduces the undesirable glare in situations where the lamps must be in sight of the workers. Uses. The major advantage of infrared heating is that it is possible to heat a surface that intercepts the radiation without heating the air or other objects that surround the surface being heated. Infrared lamps have many uses, some of which will be discussed. Paint drying. Many industrial infrared ovens are used to dry painted surfaces as they pass by on a conveyor belt. Drying times are much shorter than for other methods of drying paint. Evaporative drying. Infrared lamps can be used to remove moisture or solvents from the surface of objects by speeding the evaporation process. Porous materials with internal moisture require longer exposure, since the entire object must be heated by conducting heat from the surface to the interior. Farm heating of animals. The lamps can be used to heat brooders and to keep other livestock warm without having to heat a large, poorly insulated shed. Food. Many R lamps are used in restaurant kitchens to keep food warm while it is waiting to be served to customers. Comfort heating. Areas under theater marquees can be made more pleasant for patrons waiting in line to buy tickets, and sidewalk cafes and similar areas can be made comfortable in relatively chill weather by installing suitable lamp systems. Other uses. These include therapeutic heating of portions of the body to relieve muscle strains, tobacco drying, textile and wallpaper drying, and drying of printing inks so that press operation can be speeded up without causing image offset. See THERMOTHERAPY. Design of heating systems. Most infrared heating ovens or systems are designed by calculating the number and wattage of lamps, plus experimenting with the system after it is constructed. The latter is necessary because the number of variables is large, and precise data on the magnitude of the variables are often difficult to obtain. Voltage and lifetime. Moderate variations in voltage will alter lamp operation somewhat. A 1% increase in voltage will increase both the input and radiated power in watts by 1.5–2%, and it will also increase the filament temperature and the light from the lamp. Most lamps are rated for use over a range of voltage, such as 115–125 V for a lamp intended for use on a 120-V power system. The life of all infrared lamps exceeds 5000 h. The greatest cause of life reduction is subjecting the lamp to vibration. Exposure to radiation. Exposure of persons to moderate levels of infrared is not known to be harmful.
185
heater coil 700 K below red heat
incandescent filament lamps steam radiation 373 K 400
800
1200
2000 2400 1600 wavelength, nm
2800
3200
3600
Fig. 1. Relative energy output of four incandescent lamps as a function of wavelength and filament temperature. As temperature increases, more of the lamp’s energy shifts to ◦ shorter wavelengths. F = (K × 1.8) − 460. (After Illuminating Engineering Society, IES Lighting Handbook, 8th ed., 1993)
(a)
lighted length
(b) Fig. 2. Typical shapes of infrared lamps. (a) R lamp, with built-in reflector unit. (b) Tubular configuration with a quartz bulb.
High levels could cause burns, but the exposed individual feels the heat and would normally move away before a burn actually occurred. See INCANDESCENT LAMP; INFRARED RADIATION. G. R. Peirce; Robert Leroy Smith Bibliography. K. J. Button (ed.), Infrared and Millimeter Waves: Sources of Radiation, vol. 1, 1979; J. D. Hall, Industrial Applications of Infrared, 1947; Illuminating Engineering Society, IES Lighting Handbook, 8th ed., 1993; M. La Toison, Infrared and Its Thermal Applications, 1964; W. L. Wolfe and G. J. Zissis (eds.), The Infrared Handbook, 1978, reprint 1985.
Infrared radiation Electromagnetic radiation in which wavelengths lie in the range from about 1 micrometer to 1 millimeter. This radiation therefore has wavelengths just a little longer than those of visible light and cannot be seen with the unaided eye. The radiation was discovered in 1800 by William Herschel, who used a prism to refract the light of the Sun onto mercuryin-glass thermometers placed just past the red end of the visible spectrum generated by the prism. Because the techniques and materials used to collect, focus, detect, and display infrared radiation are different from those of the visible, and because many of the applications of infrared radiation are also quite different, a technology has arisen, and many scientists and engineers have specialized in its application. See ELECTROMAGNETIC RADIATION. Infrared techniques. A complete infrared system consists of a radiating source and background, intervening medium, optical system, detector, electronics, and display. Sources. The source can be described by the spectral distribution of power emitted by an ideal radiating body. This distribution is characteristic of the temperature of the body. A real body is related to it by a radiation efficiency factor also called emissivity. It is the ratio at every wavelength of the emission of a real body to that of the ideal under identical conditions. Figure 1 shows curves for these ideal bodies radiating at a number of different temperatures. The higher the temperature, the greater the total amount of radiation. The total number of watts per square meter is given by 5.67 × 10−8T4, where T is the absolute temperature in kelvins (K). Higher temperatures also provide more radiation at shorter wavelengths. This is evidenced by the maxima of these curves moving to shorter wavelengths with higher temperatures. Ideal radiators are also called blackbodies. See BLACKBODY; EMISSIVITY; HEAT RADIATION. The sources can be either cooperative or uncooperative. Some examples of the former include tungsten bulbs (sometimes with special envelopes), globars, and Nernst glowers. These are made of rareearth oxides and carbons. They closely approximate blackbodies and are used mostly for spectroscopy. Lasers have been used in special applications. Although they provide very intense, monochromatic, and coherent radiation, they are limited in their spectral coverage. The principal infrared lasers have been carbon dioxide (CO2) and carbon monoxide (CO) gas lasers and lead-tin-tellurium (PbSnTe) diodes. See INFRARED LAMP; LASER. Transmitting medium. The radiation of one of these sources propagates from the source to the optical collector. This path may be through the vacuum of outer space, 3 ft (1 m) of laboratory air, or some arbitrary path through the atmosphere. Figure 2 shows a low-resolution transmission spectrum of the atmosphere and the transmissions of the different atmospheric constituents. Two of the main features are the broad, high transmission regions between 3 and 5 µm and between 8 and 12 µm. These are
spectral radiant exitance, W . cm−2 . µm−1
Infrared radiation 50 maxima of ideal body curves 40 2000 K 1800 K
30
1600 K 1400 K
20
1200 K 1000 K
10 0
1
0
2 3 4 wavelength, µm
(a)
5
6
101
spectral radiant exitance, W . cm−2 . µm−1
186
1000 K
0
900 K 800 K 700 K 600 K 500 K 400 K
10−1
10−2
10−3
300 K
10−4
200 K 100 K
−5
10
0 (b)
5
10 15 20 wavelength, µm
25
30
Fig. 1. Radiation from ideal bodies (blackbodies) at different temperatures, shown on (a) a linear scale and ◦ (b) a logarithmic scale. F = (K × 1.8) − 460.
the spectral regions chosen for most terrestrial applications. Figure 3 shows a small section of this spectrum, illustrating its complexity. The radiation from the source is filtered by the atmosphere in its propagation so that the flux on the optical system is related to both the source spectrum and the transmission spectrum of the atmosphere. Optical system. A lens, mirror, or a combination of them is used to focus the radiation onto a detector. Since glass is opaque in any reasonable thickness for radiation of wavelengths longer than 2 µm special materials must be used. The table lists the properties of some of the most useful materials for infrared instrumentation. In general, these are not as effective as glass is in the visible, so many infrared optical systems use mirrors instead. The mirrors are characterized by the blank and by its coating. Blanks are usually made of aluminum, beryllium, or special silica materials. The choices are based on high strength, light weight, and good thermal and mechanical stability. They are coated with thin evaporated layers of
Infrared radiation aluminum, silver, or gold. The reflectivities of thinfilm metallic coatings increase with wavelength, and the requirements for surface smoothness are also less stringent with increasing wavelength. See MIRROR OPTICS; OPTICAL MATERIALS. Detectors. Photographic film is not useful for most of the infrared spectrum. The response of the silver halide in the film, even when specially processed, is only sensitive to about 1.2-µm. It cannot respond to radiation in the important 3–5-µm and 8–12 µm atmospheric transmission bands. If there were a film for the infrared, it would have to be kept cold and dark before and after exposure. Accordingly, infrared systems have used point (elemental) detectors or arrays of them. These detectors are based either on the generation of a change in voltage due to a change in the detector temperature resulting from the power focused on it, or on the generation of a change in voltage due to some photon-electron interaction in the detector material. This latter effect is sometimes called the internal photoelectric effect. Electrons which are bound to the different atomic sites in the crystal lattice receive a quantum of photon energy. They are freed from their bound lattice positions and can contribute to current flow. The energy in electronvolts required to do this is 1.24/λ (where λ is the wavelength in micrometers). Thus only a very small binding energy, about 0.1 eV, is permitted in photon detectors. The thermal agitation of the lattice could cause spontaneous “emission,” so most infrared photodetectors are cooled to temperatures from 10 to 100 K (−424 to −280◦F). This does not affect the speed of response of photodetectors, which depends upon photon-electron interactions, but it does slow down thermal detectors. These are sometimes cooled because they generally respond to a relative change in temperature (and for a lower temperature, a small absolute change gives a larger relative change), and thermal noise is also reduced at low temperatures. See BOLOMETER; PHOTOCONDUCTIVE CELL; PHOTODIODE; PHOTOTRANSISTOR; PHOTOVOLTAIC CELL; RADIOMETRY; THERMOCOUPLE. Electronic circuitry. The voltage or current from the detector or detector array is amplified by special circuitry, usually consisting of metal oxide semiconductor field-effect transistors (MOSFETs) which are designed for low-temperature operations. The amplified signals are then handled very much like television signals. One important feature of most of these systems is that the system does not yield a direct response; only changes are recorded. Thus a “dc restore” or absolute level must be established with a thermal calibration source. The black level of the display can then be chosen by the operator. See TRANSISTOR. Reticle. A reticle or chopper is an important feature of nonimaging systems. A typical reticle and its use are shown in Fig. 4. An image of the scene is portrayed on a relatively large fraction of the reticle— anywhere from 10 to 100%. The lens just behind it collects all the light that passes through the reticle and focuses it on a detector. The reticle is rotated.
187
CO
CH4
N2O
O3
CO2
HDO
H2O atmosphere
8000
1
2
3000
3
2000 1600 1400 1200 1000 900 wave number, cm−1 4
5
6
7 8 9 wavelength, µm
10
11
800
12
13
700
14
15
Fig. 2. Low-resolution transmission spectra of the atmosphere and of constituent gases.
If the scene is uniform, there will be no change in the detector output. However, if the scene has a point source (like the image of the hot exhaust of an aircraft engine), then a point image is formed on the reticle and a periodic detector output is generated. The phase and frequency of this output can be used with properly designed reticles to obtain the angular coordinates of the point source. The reticle pattern can also be used to reduce the modulation obtained from almost uniform scenes which radiate from large areas. Reticles are used in most infrared tracking systems, although other schemes are sometimes employed. Applications. Infrared techniques have been applied in military, medical, industrial, meteorological, ecological, forestry, agricultural, chemical, and other disciplines. Meteorological applications. Weather satellites use infrared imaging devices to map cloud patterns and provide the imagery seen in many weather reports. Substances at temperatures between 200 and 300 K (−100 and +80◦F) emit copious amounts of infrared radiation, but are not hot enough to emit in the visible. The Earth has a temperature of approximately 300 K (80◦F), and high-altitude clouds are colder (about 250 K or −10◦F). An infrared sensor placed
188
Infrared radiation
4.43
4.44
4.45
4.46
4.53
4.54
4.55
4.56
4.63
4.64
4.65
4.66
4.47
4.57
4.48
4.58
4.67 4.68 wavelength, µm
4.49
4.59
4.69
4.50
4.60
4.70
4.51
4.61
4.71
4.52
4.62
4.72
4.63
4.73
Fig. 3. High-resolution atmospheric transmission spectrum between 4.43 and 4.73 µm.
on a synchronous-orbit satellite can easily sense this difference and present a picture of these temperature patterns. Although the technique is not very widely known, radiometric sensors on lower-altitude satellites can determine the vertical temperature distribution along any path to the Earth. The infrared emission of the Earth’s atmosphere is a function of wavelength; a space-borne radiometer senses the emitted radiation at several different wavelengths. The different wavelength bands correspond to greater atmospheric transmission and some “look” deeper into the atmosphere than others. Calculations determine the atmospheric temperature distribution that is required to produce the observed wavelength distribution based on these facts. Although there can be problems with the uniqueness and conversion of such an inversion or fitting process, the results obtained have been accurate to within a few degrees and about a kilometer of altitude. See METEOROLOGICAL SATELLITES; REMOTE SENSING. Medical applications. Infrared imaging devices have also been used for breast cancer screening and other medical diagnostic applications. In most of these ap-
plications, the underlying principle is that pathology produces inflammation, and these locations of increased temperature can be found with an infrared imager. The major disadvantage is that localized increases of about 1 K usually need to be detected, and a fair number of nonpathological factors (or at least not the ones in question) can cause equivalent heating. In addition to breast cancer detection (which detects 80% or more cases and has about a 15% false-alarm rate), infrared techniques have been used in the analysis of severe burns, faulty circulation, back problems, and sinus ailments, and has even been proposed to test for medical malingerers. Airborne infrared imagers. Airborne infrared imagers have been used to locate the edge of burning areas in forest fires. Typically, a forest fire in the western United States is ignited by the lightning of a late afternoon thunderstorm; the valley becomes filled with smoke by the next morning when the crews arrive. The longer wavelengths of the emitted infrared radiation penetrate the smoke better than the visible wavelengths, so the edges of fire are better delineated.
Properties of materials for infrared instrumentation
Material
Transmission region, µm
Approximate refractive index
Comment
Germanium Silicon Fused silica Zinc selenide Magnesium fluoride Arsenic sulfur glass Diamond Salt
2–20 2–15 0.3–2.5 0.7–15 0.7–14 0.7–12 0.3–50 0.4–15
4 3.5 2.2 2.4 1.6 2.2 1.7 1.5
Opaque when heated Opaque when heated Strong, hard Expensive Not very strong Not always homogeneous Small sizes only Attacked by moisture
Infrared radiation (biology) primary lens
tor Operation and Testing, 1990; W. L. Wolfe and G. J. Zissis, The Infrared Handbook, 1978, reprint 1985.
relay lens
Infrared radiation (biology) (a)
reticle
detector
aperture in image plane
reticle (b) Fig. 4. Reticle system. (a) Configuration of components of system. (b) Reticle, showing area covered by image.
Thermal pollution contained in the power-plant effluent into rivers has been detected in various locations of the United States, and the viability of crops by assessment of the moisture content has also been accomplished with some degree of success. See INFRARED IMAGING DEVICES. Infrared spectroscopy. Infrared spectrometers have long been a powerful tool in the hands of the analytical chemist. The spectrum of a substance in either absorption or emission provides an unmistakable “fingerprint.” The only problem is to find the spectrum of the substance that matches the unknown. See INFRARED SPECTROSCOPY. Military and space applications. The best-known military techniques are probably the Sidewinder air-to-air missile, which detects the radiation from the exhaust of a jet aircraft, and the spaceborne systems which detect the extremely large amounts of radiation from the plume of launching intercontinental rockets at great distances. Infrared systems have also been used for stabilizing satellites by sensing the Earth’s horizon, for night vision applications, and for perimeter warning and gas detection. William L. Wolfe Bibliography. P. Klocek (ed.), Handbook of Infrared Optical Materials, 1991; J. L. Miller, Principles of Infrared Technology, 1994; M. Schlessinger, Infrared Technology Fundamentals, 2d ed., 1994; J. D. Vincent, Fundamentals of Infrared Detec-
Infrared radiations occupy the span between the visible spectrum and radio waves, and encompass wavelengths 780–400,000 nanometers, neither boundary being precisely delimited. All bodies above absolute zero in temperature are potential sources of infrared radiations; therefore, all organisms are continually exposed to them. About 60% of the Sun’s rays are infrared. Water absorbs infrared radiations strongly, except for the band of transparency between 780 and 1400 nm. Since protoplasm contains much water, it absorbs infrared radiations readily. A large animal absorbs infrared radiations at its surface, only the span from 780 to 1400 nm penetrating as far as the blood vessels. See INFRARED RADIATION. While many substances and even tissues selectively absorb infrared rays, and one might therefore postulate selective effects of these radiations, none have been unequivocally demonstrated except possibly in conjunction with x-rays. The reason is perhaps because quanta of infrared radiation do not excite energy states in molecules other than those excited by conducted heat. See X-RAYS. The essential biological effect of infrared rays depends primarily upon the rise in temperature produced following their absorption, which in turn increases the rate of biological activities in proportion to the temperature change. Because of the prominence of infrared in sunlight, organisms show many adaptations to dissipate or to avoid the heat. The temperature of a large animal or plant may rise temporarily, but the heat is dissipated by transpiration in the plant and by perspiration in the animal. Submerged animals and plants are protected by the water, the temperature change of which depends upon the heat capacity of the particular body of water. Treatment of biological materials with infrared radiations (780–1150 nm) either before or after x-ray treatment increases the rearrangements in chromosomes induced by the x-ray in tissues of plants and animals tested. The way in which the infrared radiations do this is unknown, but a comparable amount of conducted heat does not have the same effect. Medical practitioners make use of infrared radiations to treat sprains, strains, bursitis, peripheral vascular diseases, arthritis, muscle pain, and many other pains for which heating gives relief, probably because of vasodilation of peripheral vessels. For this purpose, glow coil radiators are generally employed, but for some purposes, the more penetrating radiations (780–1400 nm) obtainable from incandescent sources are preferable to the glow coil to stimulate circulation deeper in the tissues. See BIOPHYSICS; THERMOTHERAPY. Arthur C. Giese Bibliography. J. R. Cameron and J. G. Skofronick, Medical Physics, 1978; W. R. Hendee, Medical Radiation Physics, 2d ed., 1979.
189
190
Infrared spectroscopy
Infrared spectroscopy The spectroscopic study of the interaction of matter with infrared radiation. Electromagnetic waves from the long-wavelength limit of visible light at 800 nanometers to the shortest microwaves at 1 mm are used. In the wave-number units usually employed (oscillations per centimeter, read as reciprocal centimeters), this corresponds to 12,500–10 cm−1. See INFRARED RADIATION. Infrared regions. The infrared is conveniently divided into three regions, according to the different types of atomic and molecular energy levels that can be studied and the experimental techniques. Radiation in the near-infrared region, at 4000– 12,500 cm−1 (wavelengths of 2.5–0.8 micrometers), can excite harmonics of molecular vibrations as well as low-energy electronic transitions in molecules and crystals. The usual source is a quartz-envelope tungsten-halogen lamp, with detection by photomultipliers or photodiodes (for the shorter wavelengths) and lead sulfide (PbS) or indium antimonide (InSb) photoconductors. See INCANDESCENT LAMP; PHOTOELECTRIC DEVICES. The mid-infrared region, at 200–4000 cm−1 (50– 2.5 µm), is where most stretching and bending fundamental vibrations of molecules occur. Blackbody sources (Nernst glowers and globars) are used. Detectors are thermal (thermocouples, bolometers), pyroelectric (deuterated triglycine sulfate, DTGS), or photoconductive InSb, doped germanium, and mercury cadmium telluride (HgCdTe). See BOLOMETER; HEAT RADIATION; PHOTOCONDUCTIVITY; PYROELECTRICITY; THERMOCOUPLE. The far-infrared region, at 10–200 cm−1 (1000– 50 µm), is where rotational transitions in gaseous molecules; certain low-frequency bending, torsional, and skeletal vibrations; and lattice modes in solids are observed. Plasma emission from a high-pressure mercury arc is the usual source; DTGS and dopedgermanium bolometers are useful detectors, as is the Golay pneumatic cell. For still lower frequencies, microwave techniques are used. See CRYSTAL ABSORPTION SPECTRA; MICROWAVE SPECTROSCOPY; MOLECULAR STRUCTURE AND SPECTRA; RADIOMETRY. Development. Although infrared radiation was discovered in 1800, the development of spectroscopy
Q
M7
A
C
M6
R
D M3
SC
S1
M2
S2
M5 M1
DM
P M4
Fig. 1. Basic single-beam recording infrared prism spectrometer. The monochromator (the portion from the entrance slit S1 to the exit slit S2) is a Littrow mounting, a common arrangement for infrared instruments.
was slow until the 1880s. By then the bolometer detector had been invented and solar spectra recorded as far as 10 µm. The recognition that molecular vibrational frequencies are characteristic of specific chemical groups was emerging, based on studies in the near-infrared. Investigations in the early 1900s of the infrared spectra of organic compounds demonstrated their usefulness for identification, and established correlations between molecular structure and mid-infrared frequencies. The invention of the echelette or blazed grating, which concentrates most of the diffracted radiation into a single order, was a major experimental advance. After the rapid development of infrared and electronics technology during World War II, recording infrared spectrometers became available commercially, and were widely adopted by industrial analytical laboratories in the 1940s. Initially, these used prisms, but were soon followed by higherperformance grating instruments of increasing sophistication. The development and commercialization of Fourier-transform spectrometers and tunable infrared lasers have further expanded the capabilities of infrared spectroscopy. Instrumentation and techniques. The broad wavelength range of infrared radiation, and the few transparent optical materials available, require that infrared instruments be designed with reflecting optics: radiation is focused with front-surface aluminized mirrors rather than lenses. Because of strong infrared absorption by water vapor and carbon dioxide, operation takes place in a vacuum or the optical path is purged with dry nitrogen. Absorption spectroscopy is the principal method, where attention centers on the frequencies absorbed by the sample from a broadly emitting source. However, spectrometers and interferometers can easily be adapted to emission spectroscopy. Dispersive spectrometers. In Fig. 1, infrared radiation from the source Q is focused by a spherical mirror M2 onto the entrance slit S1 of the monochromator, after passing through the sample cell SC. The beam is collimated by the off-axis paraboloid mirror M3, dispersed by refraction through the prism P, and focused in the plane of the exit slit S2 by a second reflection from M3. A narrow spectral region of the dispersed radiation passes the exit slit and is focused by the ellipsoidal mirror M7 onto the detector D, which converts the radiant energy into an electrical signal. Since the beam has been chopped at constant frequency by a rotating mechanical chopper C, this signal is an alternating current that is amplified by the lock-in amplifier A, controlling the pen of the chart recorder R. To scan the spectrum, M4 is rotated by the drive mechanism DM, which also drives the recorder. Successive frequencies are thus moved across the exit slit, producing a record of signal intensity as a function of mirror position; with a proper mechanical linkage, this record is made linear in wavelength or wave number. The same arrangement can be used with a diffraction grating as the dispersive element; the prism is removed and mirror M4 is replaced by the grating.
Infrared spectroscopy Even with an empty sample cell, the signal from such a single-beam instrument is affected by atmospheric absorption, variations in source output, absorption and scattering from the cell, and so forth. This is avoided with double-beam spectrometers, which use a beam-switching system of rotating mirrors to compare (typically some 10 times per second) the energy transmitted by the sample to that passing through a reference beam (which may contain an empty cell), and plot the ratio. In the near-infrared, quartz prisms are used, and in the mid-infrared, alkali-metal halide or calcium fluoride (CaF2) prisms, but no prism material is suitable beyond about 50 µm. Diffraction gratings can be used in all regions with the advantage, for equivalent optical configurations, of significantly higher resolving power than prisms. Prism instruments have resolutions of little better than 1 cm−1 near wavelengths of maximum dispersion, and much poorer than this elsewhere. Grating resolution can be several tenths of a reciprocal centimeter for commercial spectrometers; some specially built instruments can resolve a few hundredths of a reciprocal centimeter. In most laboratories these instruments are being replaced by other techniques, although inexpensive double-beam grating spectrometers are still manufactured. See DIFFRACTION GRATING; OPTICAL PRISM; RESOLVING POWER (OPTICS). Fourier-transform spectroscopy. In a Michelson interferometer, light from the source strikes a thin, partially reflecting beam splitter at an angle of 45◦ and is divided into two beams that are returned by mirrors and recombined at the beam splitter. The intensity of the recombined and interfering beams, recorded as a function of the optical path difference (or retardation) as one mirror is moved, yields an interferogram. From this, the desired spectrum (that is, the source intensity as a function of wave number) can be recovered by the mathematical procedure known as a Fourier transform. See FOURIER SERIES AND TRANSFORMS; INTEGRAL TRANSFORM; INTERFEROMETRY. The components of a Fourier-transform spectrometer (FTS) are shown in Fig. 2. Radiation from the source Q is chopped at C, collimated by mirror M1, and divided at the beam splitter BS, with half sent to a fixed mirror M2 and half to a movable mirror M3. The recombined beams, with an optical-path difference 2L, pass through a sample cell SC and are focused by M4 onto the detector D. The signal is amplified (A) and stored in the digital computer DC. The interferogram is scanned by moving M3 from zero path difference (L = 0) to some distance Lmax; it is then Fourier-transformed and displayed as a spectrum on the recorder R. The spectral band pass of a Fourier-transform spectrometer is approximately 1/(2Lmax), and achieving a resolution of 0.1 cm−1, for example, requires a mechanically very reasonable 5-cm mirror movement. At sufficiently high mirror velocities (about 1 mm/s for mid-infrared spectra), the signal is automatically modulated at audio frequencies and can be amplified directly, eliminating the chopper with its 50% signal loss. See DIGITAL COMPUTER.
Fourier-transform spectroscopy offers several advantages over dispersive methods; these are especially important for effective use of the limited radiant power available from most infrared sources. Whereas a spectrometer samples only one small frequency range at any given instant and must scan these frequencies sequentially to produce a spectrum, an interferometer processes information from all frequencies simultaneously (multiplex or Fellgett advantage). Furthermore, an interferometer passes a much greater light flux for a given resolving power than a spectrometer, which can accept only a very limited solid angle of source radiation because of the narrow slits required (throughput or Jacquinot advantage). These advantages can be translated into improvements of orders of magnitude in any one of the three interrelated important parameters of resolution, signal-to-noise ratio, and scan time. Another advantage is that the mirror movement is monitored to high precision with a fixed-frequency laser, so that the wave-number scale of the transformed spectrum is highly accurate compared with spectrometer readings. Until the discovery of the Cooley-Tukey fastFourier-transform (FFT) algorithm in 1964, interferometric spectroscopy was largely impractical because of computational difficulties. The ready availability of inexpensive computers has now made it the accepted technique for high-performance infrared spectroscopy. Commercial Fourier-transform spectrometers are marketed with resolutions from a few reciprocal centimeters for routine qualitative analyses to research instruments that can resolve better than 0.002 cm−1. Typically, they operate from 4000 to 400 cm−1 on one beam splitter such as germanium-coated potassium bromide (KBr); this range can be extended broadly in either direction with different beam-splitter materials. For the far-infrared, Mylar films in various thicknesses are used. These instruments are controlled M2 L
M3
BS
M4
SC
Q D C M1 A DC R Fig. 2. Michelson interferometer used as a Fouriertransform spectrometer.
191
192
Infrared spectroscopy by microprocessors and include a digital computer to handle the Fourier transform. This computing power allows data manipulation such as repetitive scanning and signal averaging; background subtraction; spectral smoothing, fitting, and scaling; and searching digitized spectral libraries and databases to identify unknowns. Although backgrounds can be subtracted with software, some Fourier-transform instruments are designed for true optical double-beam operation. Many offer rapid-scan capability for the study of short-lived species and can be adapted to recording Fourier-transform Raman spectra. See MICROPROCESSOR; RAMAN EFFECT. Tunable infrared lasers. In most experiments the lower limit to the linewidths of molecular transitions in gases is the Doppler broadening, the width caused by the translational (thermal) motion of the molecules. In the mid-infrared this is typically 10−2–10−4 cm−1; better resolution than even interferometers provide is desirable for spectroscopy of gases. Continuously tunable and nearly monochromatic laser sources, introduced about 1970, have met this need. See LASER; LASER SPECTROSCOPY. Characteristics of some tunable lasers are listed in Table 1. Many of these emit beyond the ranges listed, which are limited to regions where useful highresolution spectroscopy has actually been demonstrated. Other techniques, such as four-wave mixing, vibronic transition lasers, and polariton lasers, have been little used for spectroscopy. Ideally, tunable infrared lasers require only a simple experimental arrangement: the source, a sample cell, and a detector with recorder. A low-resolution monochromator is often included to eliminate unwanted laser modes. In practice, many of these lasers continuously tune over very limited wavelength ranges, require constant adjustment, and tend to be complex and troublesome devices. However, they are indispensable when the highest resolution is required, and they have opened exciting new areas of infrared research. If a tunable laser has enough power to saturate molecular transitions, as do carbon dioxide (CO2) lasers, a technique known as saturation spectroscopy
can be used to record spectra below the Doppler width, and thus exploit the full potential of these sources. This ultrahigh resolution can reveal a wealth of spectral detail that is otherwise lost in the Doppler broadening. See DOPPLER EFFECT. Wave-number calibration. The accuracy of lowresolution instruments can be quickly checked by recording the spectra of standards such as polystyrene film or a solution of camphor and cyclohexanone in indene; these provide sharp absorption peaks with positions known within ±0.5 cm−1. For high-resolution spectra, tabulations are available giving the frequencies of thousands of rotation-vibration lines of light gaseous molecules such as hydrogen chloride (HCl), carbon monoxide (CO), water vapor (H2O), hydrogen cyanide (HCN), carbon dioxide (CO2), nitrous oxide (N2O), ammonia (NH3), and ethylene (C2H2), with accuracies of ±0.005 cm−1 or better. Sample handling. Gases are examined in simple cylindrical cells with infrared-transmitting windows; a typical path length is 10 cm. For trace species or weak absorptions, compact multiple-reflection White cells (named after their inventor) are made with adjustable optical path lengths up to 100 m (300 ft) and more. Liquid cells are available for sample thicknesses of 0.01 mm to several centimeters. Solids can be observed directly in thin sections; as mulls in mineral or fluorocarbon oils; as finely ground dispersions in disks of an infrared-transparent salt such as potassium bromide; or in solutions of solvents such as carbon tetrachloride (CCl4) or carbon disulfide (CS2) that have few infrared bands. (The usefulness of water is limited by its strong, broad infrared absorptions.) Many accessories have been designed for sampling problems. Infrared microscopes with reflecting optics can focus on objects as small as 10 µm to record the spectra of subnanogram samples; this is useful in particulate and fiber analysis and forensic work. Attenuated total reflection, where the infrared beam is transmitted inside a thin, highly refractive crystal [such as zinc selenide (ZnSe) or germanium] by multiple internal reflections, and is
TABLE 1. Tunable mid-infrared laser sources∗
Device Semiconductor diode lasers Gas lasers Waveguide CO2 Zeeman-tuned neon or xenon Spin-flip Raman lasers Color-center lasers Optical parametric oscillators Nonlinear optical mixing techniques Difference-frequency generation Tunable-sideband generation ∗
Demonstrated spectral coverage, µm
Maximum resolution, cm ⫺1
Typical continuouswave power, W
3– 27
2 ⫻ 10⫺6
10⫺3
9–11† ∼ 3.5† 7–8† 5–6 1.5–4 1–16
3 ⫻ 10⫺7 1 ⫻ 10⫺3 1 ⫻ 10⫺3 1 ⫻ 10⫺6 3 ⫻ 10⫺8 < 1 ⫻ 10 ⫺3
1 10⫺3 ? 0.1 10⫺2 10⫺3
2–6 ∼ 3.4† 9–11†
1 ⫻ 10⫺4 2 ⫻ 10⫺10 3 ⫻ 10⫺6
10⫺6 ? 10⫺3
Adapted from R. S. McDowell, Infrared laser spectroscopy, Proc. SPIE, 380:494–507, 1983. Tunable only near discrete lines in these regions.
†
Infrared spectroscopy
2.5 100
3
4
wavelength, µm 5
6
8
10
15 20 25
transmittance, %
80
60
40 CH2
CH
20
n 0 4000
3600
3200
2800
(a) 2.5 100
3
2400 2000 wave number, cm−1 5
4
1600
6
1200
8
800
10
400
15 20 25
transmittance, %
80
60
40 O 20 C
CH3 0 4000
CH3
3600
3200
2800
2400
2000
1600
1200
800
400
(b) 2.5 100
transmittance, %
80
3
5
4
6
8
10
15 20 25
W W
C C
60
40
20
0 4000
CH2
CH2
3600
3200
2800
2400
2000
1600
1200
800
400
(c) Fig. 3. Mid-infrared absorption spectra recorded on a Fourier-transform spectrometer at a resolution of 1 cm−1. (a) Polystyrene film, 0.05 mm thick. The weak intensity fluctuations between 2000 and 2800 cm−1 are interference fringes, caused by reflections from the surfaces of the plastic, and can be used to determine the film thickness. (b) Liquid acetone, capillary film between cesium iodide (CsI) plates. (c) Ethylene gas at about 30 torr (4 kilopascals) in a 10-cm (4-in.) cell. The absorptions marked W and C are due to traces of water vapor and carbon dioxide (CO2), respectively. Below 600 cm−1 the potassium chloride (KCl) windows absorb. (V. T. Hamilton, Los Alamos National Laboratory)
193
Infrared spectroscopy
I0 I
= log10
100 T
(1)
of course, vary with frequency and should formally be expressed as T(ν) and A(ν). Examples of mid-infrared spectra recorded on a commercial instrument are shown in Fig. 3. The very strong absorption in Fig. 3b (liquid acetone) at 1705–1725 cm−1 is characteristic of the carbonyl C O stretch in saturated aliphatic ketones. The rotational lines in Fig. 3c (ethylene gas) between 850 and 1050 cm−1 arise from quantized changes in the amount of rotational excitation that occurs simultaneously with the CH2 out-of-plane “wagging” vibrational transition at 949 cm−1. A similar structure is seen on the C H stretch between 3050 and 3200 cm−1, and would be evident in all bands at higher resolution. In liquids and solids, this rotational structure is suppressed. A single rotation–vibration band of a gaseous molecule, the infrared-active stretch (designated ν 3) of sulfur hexafluoride (SF6), is shown in Fig. 4 at four resolutions from 2 to 10−6 cm−1, to illustrate the additional detail revealed at higher resolution. The emission linewidth of the tunable semiconductor diode laser used to record Fig. 4c was less than 10−5 cm−1, but the effective resolution is the sulfur hexafluoride Doppler width of 0.0010 cm−1. Figure 4d is a sub-Doppler saturation spectrum taken inside the linewidth of a carbon dioxide (CO2) laser line at an effective resolution of less than 10−6 cm−1. This was recorded in the derivative mode, where the ordinate is dA/dν, the derivative of A with respect to ν, instead of A or T. Many thousands of features like those in Fig. 4c and d have been assigned to specific sulfur hexafluoride transitions. Applications. Among the more important applications of infrared spectroscopy are chemical analysis and the determination of molecular structures. Qualitative analysis. Infrared spectra are ideal for identifying chemical compounds because every molecule [except homonuclear diatomics such as ni-
absorption
A = log10
trogen (N2), oxygen (O2), and chlorine (Cl2)] has an infrared spectrum. Since the vibrational frequencies depend upon the masses of the constituent atoms and the strengths and geometry of the chemical bonds, the spectrum of every molecule is unique (except for optical isomers). Pure unknowns can thus be identified by comparing their spectra with recorded spectra; catalogs are available in digitized versions, and searches can be made rapidly by computer. Simple mixtures can be identified with the help of computer software that subtracts the spectrum of a pure compound from that of the unknown mixture. More complex mixtures may require fractionation
(a)
(b) 900
1100
940
950 ×400
absorption
absorbed by a sample in optical contact with the crystal, may be appropriate for difficult materials such as gels and strongly absorbing liquids. Photoacoustic detectors can be used for nearly opaque samples. Other accessories provide spectra of specularly reflected light from coatings and films; of diffuse reflectance from inhomogeneous samples such as powders; of samples under high pressure in diamondanvil cells; and so forth. Many devices are available for holding samples at cryogenic or elevated temperatures. See HIGH-PRESSURE PHYSICS; PHOTOACOUSTIC SPECTROSCOPY. Typical spectra. Infrared spectra are usually plotted as percent transmittance T or absorbance A on a scale linear in wave number ν (less commonly, in wavelength λ). Transmittance is the ratio of the intensity of radiation transmitted by the sample (I ) to that incident on the sample (I0), expressed as a percentage, so that T = 100I/I0; this is related to absorbance by Eq. (1). Transmittance and absorbance,
(c) 947.72
947.76
947.78
dA /dν
194
947.7417 947.7420 (d)
947.7423
947.7426
wave number (ν), cm−1
Fig. 4. S F stretching fundamental ν 3 of sulfur hexafluoride (SF6) as it appears with increasing resolving power. (a) Fourier-transform spectrum at resolution of 2 cm−1. (b) Fourier-transform spectrum at resolution of 0.06 cm−1. (c) Expansion of small portion of the strong central peak in part b, recorded with a tunable semiconductor diode laser at an effective resolution of 0.0010 cm−1. (d ) Further resolution of the absorption line at 947.742 cm−1 into five strong components with subDoppler saturation spectroscopy at an effective resolution of 0, but is unbounded on [a,b]. Then by definition, Eq. (23) is written, provided the limit on the right b b f (x) dx = lim f (x) dx (23) a
→0
a+
exists. For example, if f(x) = x−2/3, relation (24)
Integration follows. Similarly, Eq. (25) follows. When f(x) is in 1 1 x−2/3 dx = lim x−2/3 dx →0
0
= lim 3(1 − 1/3 ) = 3 →0
0 −1
x−2/3 dx = lim
→0
−
(24)
x−2/3 dx
b→+∞
a −∞
a a
f (x) dx = lim
f (x) dx
b→−∞
(27)
b
limits on the right exist. More general cases are treated by dividing the real axis into pieces, each of which satisfies one of the conditions just specified. Equation (28) is an example. Because the second and +∞ −1 − x−5/3 dx = lim x 5/3 dx a→−∞ a −∞ −δ 1 x−5/3 dx + lim x−5/3 dx + lim δ→0
→0
−1
b
+ lim
b→+∞
x−5/3 dx
(28)
1
third of the limits on the right do not exist, integral (29) does not exist. However, it is sometimes use +∞ x−5/3 dx (29) −∞
ful to assign a value to it, called the Cauchy principal value, by replacing definition (28) by Eq. (30). In this − +∞ 1 x−5/3 dx = lim x−5/3 dx + x−5/3 dx e→0
−∞
+ lim
b→+∞
−1
−b
−1
x−5/3 dx +
b
x−5/3 dx
n
f (xi , yi )Ai
(32)
limit I when the maximum diagonal of a rectangle Ri tends to zero, then f is said to be integrable over R, and the limit I is called the Riemann integral of f over R. It will
be denoted here by the abbreviated symbol Rf. Properties corresponding to those numbered (i) to (iv) can be proved for these double integrals, except that property (ii) should now read as follows: (ii ) If the rectangle R is the union of two rectangles S and T, then Eq. (33) can be written. Necessary and f = f+ f (33) R
S
T
sufficient conditions for a function f(x, y) to be integrable over R can be stated as follows: f is bounded on R; and the set of points where f is discontinuous can be enclosed in a series of rectangles, the sum of whose areas is arbitrarily near to zero. To define the integral of a function f(x, y) over a more general domain D where it is defined, suppose D is enclosed in a rectangle R, and define F(x, y) by Eqs. (34). Then f is integrable over D if case F is F(x, y) = f (x, y) in D F(x, y) = 0 outside D
(34)
integrable over R, and Eq. (35) holds, by definition. f = F (35) D
R
When the function f(x, y) is continuous on D, and D is defined by inequalities of the form (36), where a≤x≤b
(30)
a(x) ≤ y ≤ β(x)
1
particular case the value is zero. The preceding definitions apply to cases when the integrand f(x) is bounded, except in arbitrarily small neighborhoods of a finite set of points. In the closing years of the nineteenth century various extensions were made to more general cases. Then Henri Lebesgue produced a comparatively simple general theory for the case of absolutely convergent integrals. The integral of Lebesgue will be discussed below. The integral of Denjoy includes both nonabsolutely convergent integrals and the integral of Lebesgue. Multiple integrals. The concept called the Riemann integral can be extended to functions of several variables. The case of a function of two variables illustrates sufficiently the additional features which arise.
(31)
i=1
(25)
tegrable on every finite subinterval of the real axis, by definition Eqs. (26) and (27) hold, provided the +∞ b f (x) dx = lim f (x) dx (26)
c≤y≤d
partition of R into n nonoverlapping rectangles Ri with areas Ai, and let (xi, yi) be a point of Ri. Define S by Eq. (32). In case the sum S tends to a definite S=
→0
a
R: a ≤ x ≤ b
−1
= lim 3[(− )1/3 + 1] = 3
To begin with, let f(x, y) denote a real function defined on a rectangle of the form (31). Let P be a
(36)
the functions α(x) and β(x) are continuous, the double integral of f over D always exists, and may be represented in terms of two simple integrals by Eq. (37). b β(x) f = f (x, y) dy dx (37) D
a
α(x)
In many cases, this formula makes possible the evaluation of the double integral. Improper multiple integrals have been defined in a variety of ways. There is not space to discuss these definitions and their relations here. Line, surface, and volume integrals. A general discussion of curves and surfaces requires an extended treatise. The following outline is restricted to the simplest cases.
305
306
Integration A curve may be defined as a continuous image of an open interval. Thus a curve C in the xy plane is given by a pair of continuous functions of one variable, Eq. (38), whereas a curve in space is given by x = f (u)
y = g(u)
a b, a < ui < b for i = 1, . . . , n − 1, and if f(x) is continuous at each ui, then Eq. (89) may be written. With some restrictions to
b
f (x) dα(x) = a
n−1
f (ui ) [ci+1 − ci ]
(89)
i=1
ensure convergence, this may be extended to the case in which α has infinitely many discontinuities. In the case in which α is nondecreasing, the Stieltjes integral has an extension which is similar to that of the Lebesgue integral and is called the LebesgueStieltjes integral. Other extensions of the Lebesgue integral are the integrals of Denjoy. They apply to certain functions f for which |f| is not integrable, and include the improper integrals, but not the Cauchy principal value. Numerous other types of integrals have been defined. In particular the integral of Lebesgue has been extended to cases in which the independent variable lies in a suitable space of infinitely many dimensions, or in which the functional values of f lie in such a space. See CALCULUS; FOURIER SERIES AND TRANSFORMS; SERIES. Lawrence M. Graves Bibliography. S.-B. Chae, Lebesque Integration, 2d ed., 1995; H. B. Dwight, Tables of Integrals and Other Mathematical Data, 4th ed., 1961; J. S. Gradshteyn and J. M. Ryzhjk, Table of Integrals, Series, and Products, 6th ed., 2000; P. R. Halmos, Measure Theory, 1950, reprint 1991; J. L. Kelley and T. P. Srinivasan, Measure and Integral, 1987; E. J. McShane, Unified Integration, 1983; W. Rudin, Real and Complex Analysis, 1987; C. W. Schwartz, Measure, Integration and Function Spaces, 1994; D. W. Stroock, A Concise Introduction to the Theory of Integration, 3d ed., 1998.
Integumentary patterns All the features of the skin and its appendages that are arranged in designs, both in humans and other animals. Examples are scales, hairs, and feathers; coloration; and epidermal ridges of the fingers, palms, and feet. In its common usage, the term applies to the configurations of epidermal ridges, collectively named dermatoglyphics. Dermatoglyphics are characteristic of primates. See EPIDERMAL RIDGES; FEATHER; HAIR; SCALE (ZOOLOGY). The superficial ridges are associated with a specific inner organization of skin. Skin is composed of two chief layers, the epidermis on the outside and the dermis underlying it (Fig. 1). These two layers are mortised by pegs of dermis, a double row of pegs corresponding to each ridge; these pegs accordingly form a patterning like that of the ridges. Ridge patterns. The patterning of ridges, including that of the epidermal-dermal mortising, is determined during the third and fourth fetal months. All characteristics of single ridges and of their alignments are then determined definitively. Ridge alignments reflect directions of stress in growth of the hand and foot at the critical period of ridge differentiation. An important element in the production of localized patterns, for example, on the terminal segment of each digit, is the development in the fetus of a series of elevations, the volar pads. Volar pads. The pads are homologs of the prominent pads on the paws of some mammals, but in primates they attain little elevation and soon tend to subside. The volar pads are disposed in a consistent topographic plan. Localized patterns have the same placement because growth of the pad is the determiner of the specific local pattern. When a pad has subsided before ridges are formed, its area does not present a localized pattern, and the ridges follow essentially straight, parallel courses. Variations in contours of the pads are accompanied by wide variations in the designs formed by the ridges overlying them. Pattern variability. Variability of patterning is a major feature of dermatoglyphics and the basis for
pores
epidermis 2 mm
310
dermis
Fig. 1. Structure of ridged skin showing the chief two layers. (From H. Cummins and C. Midlo, Finger Prints, Palms and Soles, McGraw-Hill, 1943)
Intelligence Olsen, Sr., Scott’s Fingerprint Mechanics, 1978; R. I. Spearman, The Integument: A Textbook of Skin Biology, 1973.
Intelligence
Fig. 2. Dermatoglyphics of palm and sole. (After H. Cummins and C. Midlo, Finger Prints, Palms and Soles, McGraw-Hill, 1943)
(a)
(b)
Fig. 3. Fingerprints. (a) Whorl pattern. (b) Loop pattern. (From H. Cummins and C. Midlo, Finger Prints, Palms and Soles, McGraw-Hill, 1943)
various applications (Fig. 2). In personal identification, prints (Fig. 3), customarily of fingers, are classified for filing in accordance with variables of pattern type and counts of ridges. Systematic filing makes it possible to locate readily and compare the sets of prints corresponding in classification to a set for which an identification is sought. In anthropological and medical investigations, groups of individuals are compared statistically in reference to the occurrence of these variables. Deductions may be drawn in accordance with likeness or unlikeness in the directions of variation. A few examples are cited. Trends of inheritance have been demonstrated in family groups and in comparisons of the two types of twins, fraternal (two-egg) and identical (one-egg). Dermatoglyphics thus are useful in diagnosing the types of twins and in analyzing cases of questioned paternity. Among different racial groups, the similar or discrepant trends of variation have been used to analyze racial affinities. Trends of variation are unlike in right and left hands, and the fact that they differ in accordance with functional handedness indicates an inborn predisposition of handedness. Departures from normal trends occur in many conditions associated with chromosomal aberrations, such as Down syndrome. See FINGERPRINT; HUMAN GENETICS; SKIN. Harold Cummins Bibliography. M. S. Elbauly and J. D. Schindler, Handbook of Clinical Dermatoglyphs, 1971; R. D.
General mental ability due to the integrative and adaptive functions of the brain that permit complex, unstereotyped, purposive responses to novel or changing situations, involving discrimination, generalization, learning, concept formation, inference, mental manipulation of memories, images, words and abstract symbols, eduction of relations and correlates, reasoning, and problem solving. Theory. Verbal definitions or descriptions of intelligence, of examples of intelligent behavior, or of the characteristics of tests that best measure intelligence are all inadequate for a scientific understanding of intelligence. This is because “intelligence” is not a denotative noun, but a hypothetical construct or theory that is needed to account for one of the most striking and best-established phenomena in all of psychology: the ubiquitous positive intercorrelations among measurements of individual differences on all tests of “mental ability,” however diverse. By “mental” is meant only that individual differences in the test scores are not the result of individual differences in sensory acuity or motor dexterity and coordination. By “ability” is meant conscious, voluntary performance on a task (for example, a test item) for which it can be reliably determined whether or not the performance meets some objective standard of correctness or efficiency. The British psychologist Charles E. Spearman was the first to discover that all mental tests show positive correlations with one another when the scores are obtained in a representative sample of the general population. From this he inferred that all mental tests of every kind must measure some factor of ability that they all measure in common. He termed this common factor “general mental ability” and gave it the label g. Spearman, and most psychologists following him, have identified “general intelligence” with g. By means of factor analysis, a quantitative method developed by Spearman for analyzing a matrix of all of the intercorrelations among a number of tests, it became possible to determine the proportion of the total variation in all of the test scores that can be attributed to variation in the general factor g common to all of the tests, and to determine the degree to which each test measured the g factor. A test’s g loading, as derived from factor analysis, can be thought of as the test’s coefficient of correlation with the g factor. Factor analyses of more than a hundred varieties of tests or test items reveal a wide range of g loadings. By comparing the tests having high, moderate, and low g loadings, one can gain an inkling of the nature of g. It cannot be identified with any specific type of informational content or skill required by the test. In these respects, highly g-loaded items are as diverse as all items in general. For example,
311
312
Intelligence tests of vocabulary, problem arithmetic, and block designs (copying designs with multicolored blocks) are all highly g-loaded (and by their very nature are highly correlated with one another), whereas tests of spelling ability and arithmetic computation are only moderately g-loaded. Spearman, after inspecting hundreds of types of tests in relation to their g loadings, concluded that the only common features of the most highly g-loaded tests are that they involve “the eduction of relations and correlates” and “abstractness.” In more general terms, it is found that g is related to the complexity of the mental operations that are required by the task. Tasks involving reasoning and problem solving are much more g-loaded than tasks involving rote learning, routine skills, or the recall of specific information. When diverse tests are rankordered by their g loadings, they fall into four groups, from highest to lowest g loadings; the test items in these groups can be characterized as relational, associative, perceptual, and sensory-motor. As yet there is no satisfactory or generally accepted theory of the neural mechanisms responsible for g or for individual differences in g, although there has been no paucity of speculation involving such notions as the total neural energy available to the brain, the number of brain cells and synaptic connections, and the speed of neural conduction. The sexes, on average, do not differ in g; however, females slightly excel in verbal ability, and males slightly excel in spatial and quantitative abilities. Factor analysis has revealed about 30 other factors of mental ability besides g. The most important of these factors, in terms of the amount of individual variation they account for, are verbal ability, numerical ability, spatial-visualization ability, and memory. A number of persons all with the same level of general intelligence or g may differ markedly from one another in these other ability factors. See FACTOR ANALYSIS. Unity or modularity of intelligence. Some of the highly correlated abilities identified as factors probably represent what are also referred to as modules. Modules are distinct, innate brain structures that have developed in the course of human evolution. They are especially characterized by the various ways that information or knowledge is represented by the neural activity of the brain. Thus, the main modules are linguistic (verbal, auditory, lexical, and semantic), visuo-spatial, object recognition, numerical-mathematical, musical, and kinesthetic. Although modules generally exist in all normal individuals, they are most strikingly in those with highly localized brain damage or pathology, and idiot savants. Savants evince striking discrepancies between a particular narrow ability and nearly all other abilities, often showing the greatest inconsistency with the individual’s low level of general ability. There are some savants who are too mentally retarded to take care of themselves, yet can perform feats of mental calculation, or play the piano by ear, or memorize pages of a telephone directory, or draw objects from memory with nearly photographic ac-
curacy. The modularity of these abilities is evinced by the fact that rarely, if ever, is more than one of them seen in a given savant. In contrast, there are individuals whose tested general level of ability is within the normal range, yet who, because of a localized brain lesion, show a severe deficiency in some particular ability, such as face recognition; receptive or expressive language dysfunctions (aphasia); or inability to form long-term memories of events. Again, modularity is evinced by the fact that these functional deficiencies are quite isolated from the individual’s, total repertoire of abilities. In individual’s with a normally intact brain, a module’s efficiency can be narrowly enhanced through extensive experience and practice in the particular domain served by the module. Measurement. Intelligence tests are diverse collections of g-loaded tasks (or items), graded in difficulty. The person’s performance on each item can be objectively scored (for example, pass or fail); the total number of items passed is called the raw score. Raw scores are converted to some form of scaled scores which can be given a statistical interpretation. The first practical intelligence test for children, devised in 1905 by the French psychologist Alfred Binet, converted raw scores to a scale of “mental age,” defined as the raw score obtained by the average of all children of a given age. Mental age (MA) divided by chronological age (CA) yields the well-known intelligence quotient or IQ. When multiplied by 100 (to get rid of the decimal), the average IQ at every age is therefore 100, with a standard deviation of approximately 15 or 16. Because raw scores on mental tests increase linearly with age only up to about 16 years, the conversion of raw scores to a mental-age scale beyond age 16 must resort to statistical artifices. Because of this problem and the difficulty of constructing mental-age scales which preserve exactly the same standard deviation of IQs at every age, all modern tests have abandoned the mental-age concept and the calculation of IQ from the ratio of MA to CA. Nowadays the IQ is simply a standardized score with a population mean of 100 and a standard deviation (σ ) of 15 at every age from early childhood into adulthood. The population distribution of IQ conforms closely to the normal curve in the range of ±2σ from the mean (that is, from IQ 70 to 130), but there are excesses beyond that range. The middle 50% considered “average,” fall between IQs of 90 and 110. IQs below 70 generally indicate “mental retardation,” and above 130, “giftedness.” The vast majority of students in selective colleges range above IQ 110, and graduates of 4-year colleges average about 120–125. See MENTAL RETARDATION. Correlates of IQ. The IQ, as a measure of g, gains its importance from its substantial correlations with many educationally, occupationally, and socially valued variables. No other single measurement that can be obtained on a child (including parents’ education and socioeconomic status) better predicts the child’s scholastic performance. Childhood IQ, because of its relationship to later educational attainment, also predicts adult occupational status.
Intelligent machine The fact that the IQ correlates with physical variables such as brain size and the amplitude and speed of evoked electrical potentials of the brain indicates that IQ tests reflect something biological, and not merely knowledge or skills acquired in school or in a cultured home. Genetics. Individual differences in IQ involve polygenic inheritance, much as other continuous characteristics such as height and weight. The proportion of the total phenotypic variance in the population attributable to genetic factors, termed heritability, can be estimated by the methods of quantitative genetics from the correlations between various kinships— identical and fraternal twins, siblings, parent-child, and adopted children. Virtually all of the methodologically defensible estimates of the heritability of IQ range between.50 and.80, with a median value of about.70. Thus genetic factors are about twice as important as environmental factors as a cause of individual variation in intelligence. All levels of intelligence are found within every major racial group. There is no definitive scientific evidence or consensus on the relative contributions of genetic and environmental factors to any undisputed average differences in intelligence between races. See BEHAVIOR GENETICS. Arthur R. Jensen Bibliography. M. Anderson, Intelligence and Development: A Cognitive Theory, 1992; N. Brody, Intelligence, 2d ed., 1997; J. B. Carroll, Human Cognitive Abilities: A Survey of Factor-Analytic Studies, 1993; D. K. Detterman (ed.), Current Topics in Human Intelligence, vol. 2: Is Mind Modular or Unitary?, 1992; J. A. Fodor, The Modularity of Mind, 1983; R. J. Sternberg (ed.), Encyclopedia of Intelligence, 1994.
Intelligent machine Any machine that can accomplish its specific task in the presence of uncertainty and variability in its environment. The machine’s ability to monitor its environment and then adjust its actions based on what it has sensed is a prerequisite for intelligence. The term intelligent machine is an anthropomorphism in that intelligence is defined by the criterion that the actions would appear intelligent if a person were to do it. A precise, unambiguous, and commonly held definition of intelligence does not exist. Examples of intelligent machines include industrial robots equipped with sensors, computers equipped with speech recognition and voice synthesis, self-guided vehicles relying on vision rather than on marked roadways, and so-called smart weapons, which are capable of target identification. These varied systems include three major subsystems: sensors, actuators, and control. The class of computer programs known as expert systems is included with intelligent machines, even though the sensory input and output functions are simply character-oriented communications. The complexity of control and the mimicking of human deductive and logic skills makes expert systems central in the realm of intelligent
Task Specification
Planning System control Motion control
Interpretation
Actuators
Sensors
Physical World
Fig. 1. Flow of information and data in a typical intelligent machine.
machines. See EXPERT SYSTEMS; GUIDANCE SYSTEMS; ROBOTICS; VOICE RESPONSE. Intelligent control. Since the physical embodiment of the machine or the particular task performed by the machine does not mark it as intelligent, the appearance of intelligence must come from the nature of the control or decision-making process that the machine performs. Given the centrality of control to any form of intelligent machine, intelligent control is the essence of an intelligent machine. The control function accepts several kinds of data, including the specification for the task to be performed and the current state of the task from the sensors. The control function then computes the signals needed to accomplish the task. When the task is completed, this also must be recognized and the controller must signal the supervisor that it is ready for the next assignment (Fig. 1). Automatic, feedback, or regulatory systems such as thermostats, automobile cruise controls, and photoelectric door openers are not considered intelligent machines. Several important concepts separate these simple feedback and control systems from intelligent control. While examples could be derived from any of the classes of intelligent machines, robots will be used here to illustrate five concepts that are typical of intelligent control. (1) An intelligent control system typically deals with many sources of information about its state and the state of its environment. A robot may contain a vision system and tactile sensors in addition to the internal position sensors that allow calculation of its spatial location. Given multiple sources of information, the intelligent control function evaluates them as needed based on the nature of the functions that must be performed at any instant. (2) An intelligent control system can accommodate incomplete or inconsistent information. For example, a robot expecting to pick up a single object from a conveyor belt may be confronted with two overlapping objects or no objects. (3) Intelligent control is characterized by the use of heuristic methods in addition to algorithmic control methods. A
313
314
Intelligent machine heuristic is a rule of thumb, a particular solution or strategy to be used for solving a problem that can be used for only very limited ranges of the input parameters. A possible heuristic for solving a particular robot-implemented insertion task would be to move the part incrementally along an outward spiral path, retrying the insertion, if the original attempt should fail. This heuristic could be repeated many times until an area large compared to positional uncertainties was covered. Clearly, such a heuristic is applicable only to a very small fraction of robot tasks. (4) An intelligent machine has a built-in knowledge base that it can use to deal with infrequent or unplanned events. An intelligent robot would have mechanisms, both sensors and heuristics, ensuring its own and its operator’s safety. (5) An algorithmic control approach assumes that all relevant data for making decisions is available. The heuristic approach is predicated on the knowledge that all relevant data cannot be made available, even in principle, and that the control system will have to resolve ambiguous cases. See ADAPTIVE CONTROL; CONTROL SYSTEMS. Intelligent robots. To clarify these very general concepts and to illustrate them with more detail, a sensor-equipped robot performing an electronics assembly task will be described. The task is to pick a part from an egg-carton-like pallet, inserting it into a test fixture, and, if it tests acceptably, to insert it into the appropriate place on the printed wiring board being assembled. A human worker, especially one with some experience in production line assembly, could be instructed in this task in a few minutes and would also know, without further instruction, what actions to take for the many exceptional conditions that can arise in this type of activity. Outlining the steps that must be taken to program a robot to accomplish the same task will illustrate the amount of prior knowledge possessed by a person and the general level of complexity of programming in an environment where all relevant data and actions must be stated completely and unambiguously. The robot program will be given in verbal outline form rather than coded in a particular robot programming language. The numbers are used both for specifying branches in the flow of control and for referencing in the discussions that follow. 1. Wait for a printed wiring board to enter the assembly station. 2. Take an image of the parts presentation pallet: an egg-carton arrangement that separates and loosely orients the parts. 3. Locate a part in the image. 4. If the pallet is empty, move it to the stacking area and signal the conveyor to move a fresh pallet into position. Resume at step 2. 5. If a part is present, pick up the part and position it in the visual inspection station. Take an image of the bottom of the part. Determine the location and orientation of the pin field. 6. If an incorrect pin field is seen, deposit the part in the reject bin and resume at step 2.
7. If the pin field is correct, orient the robot hand based on the observed orientation of the part and insert the part into the electrical test station. 8. Signal the electrical test to begin. 9. If the test fails, remove the part from the test fixture and deposit it in the reject bin. Then resume at step 2. 10. If the electrical test was passed, regrasp the part to ensure it is accurately held by the robot. Remove it from the test fixture and move it to a point just above the insertion location on the printed wiring board. 11. Insert the part in the printed wiring board by moving it vertically downward while monitoring the force. When the vertical force reaches a specified value, stop the robot’s vertical motion. 12. Check the vertical position of the robot hand when it stopped moving. If it is at a height indicating that the part was inserted, release the part, retract the robot hand, and resume at step 1. 13. If the part was not inserted correctly, move the robot hand in the plane of the printed wiring board by a small increment along a prescribed path and resume at step 11. Several comments can be made about a robot that is following the above program and the program itself. Such a robot would appear to be intelligent given the definition in the beginning of the article. There is variability in the robot’s environment, and it is using a variety of sensory data to guide its actions. However, the program is incomplete. There are many things that can go wrong that have not been accounted for: If a hole in the printed wiring board for the part is missing, the insertion cannot be successfully accomplished. In this case, program steps 11, 12, and 13 form an apparent infinite loop. Many other similar problems lurk in this program. The program also presupposes significant capabilities and data in the robot and related vision system. The vision system and robot must be calibrated; that is, a procedure must be periodically executed that calculates the transformations between the robot and vision system coordinates. The vision system must be programmed to recognize first what the part looks like in the context of the pallet (step 3) and second what a correct pin field looks like (step 5). The precise locations of all points of reference for the robot must be determined. The search strategy to be followed if the part is not inserted correctly (step 13) must be carefully defined. The calibration and teaching programs required to accomplish these steps may be as complex as the primary task. Execution of the assembly program and all of the implied subsidiary functions defines the control capability of the robot. The threshold of intelligent control is reached when the system can deal with many of the infrequent and exceptional conditions without requiring outside intervention or assistance. A second part of intelligent control is the development of the detailed programs given a high-level statement of the task requirements. This is the planning function (Fig. 1). Systems of this level of complexity are
Intelligent machine primarily found in robotics research laboratories. The automatic planning of complex robot control programs given high-level task descriptions is an area of research. Control hierarchy. The control function in an intelligent machine is generally implemented as a hierarchy of processes. The upper levels are responsible for global decision-making processes and for the planning phases of the task. The lower levels implement critical time-dependent subtasks. In Fig. 2 the interconnection of a group of processors forming a vision-controlled robot is diagrammed. The manipulator arm is itself controlled by a two-level hierarchy. The lower level consists of a processor that performs the servo function for each joint of the arm. The robot controller converts world coordinates supplied by the system controller into the values for each joint required to place the robot hand in the desired location. The robot controller also divides complete movement specifications into a series of short steps that result in coordinated motion and controlled velocity of the robot. See SERVOMECHANISM. The control program outlined in the robot example would be executed on the system-level control processor. In addition to the vision, manipulator, and safety subsystems, the system-level controller would also be the interface with the robot operator and the factory communications system. The partitioning of subtasks onto autonomous communicating processors is a natural and economic way to obtain increased system performance by using multiple microprocessors. See MICROPROCESSOR. Computer vision. A computer vision system interprets the contents of a large array (256 × 256 is common) of intensity values that is the computer representation of a frame of a television image. Interpret in this context means to find, recognize, or inspect a prespecified object or group of objects. The first processing step is to isolate or segment the image of the object from the background image. The most common technique in use is to create a high-contrast image, typically by backlighting the object of interest, which allows a simple threshold to be used for the segmentation. Objects in the image are found at those locations where the intensity value is less than some prespecified constant. Lower-contrast images, such as a part in a pallet, require segmentation techniques that rely on rapid changes in brightness (edges) or differences in intensity between two images, or techniques that look for specific visual features by using any of a large number of matched filter techniques. The amount of computation required to segment a low-contrast image is significantly greater than that required by thresholding. Once an object image is segmented from the background, a set of parameters known as features is calculated. The set of features, which may include the object area, perimeter, number of holes and any other enumerable parameter, is compared to a model of the object either to ascertain its identity or to compare it to some inspection criteria. The specific set of features needed for a specific visual recognition problem depends
factory communications system
system controller
operator terminal
315
other equipment
joint servo
cameras
safety monitor
robot controller
vision system
joint servo
manipulator
Fig. 2. Interconnection of processors forming a sensor-based robot.
on the exact appearances of the objects in complex ways with the result that the feature set is normally determined empirically. Similarly, orientation and location quantities are also computed for the segmented image. Generally, an image represents a two-dimensional projection of a three-dimensional object. Correction to the perceived shape and location may have to be made based on its location relative to the viewing camera. No matter what complexity of techniques are used in its construction, a computer vision system is invariably implemented with a dedicated processor. See COMPUTER VISION. Interconnection of machines. Another trend expected in the area of intelligent machines is their interconnection into cooperating groups to address larger tasks. Standardization of information exchange protocols and languages describing various fields of work are prerequisites to any extensive development of interconnected, cooperating systems. Capabilities. Slow, steady improvements may be expected in intelligent machine capabilities. Progress will result from advances in microprocessors and the implementation of various functions as very large-scale integrated (VLSI) circuits. See INTEGRATED CIRCUITS. Expert systems. The control schemes outlined above are algorithmic even if the resulting system behavior appears intelligent. The explicit programming methods described do not lend themselves to situations where several alternative actions could be rationally considered. In the explicit programming of a complex task, there is very little opportunity to use default actions for many infrequent conditions. The method of expert systems is expected to provide a paradigm for a better way of robot task planning and
316
Intelligent machine execution. Expert systems have a knowledge base and an inference procedure for obtaining a result consistent with the contents of the knowledge base, given an arbitrary set of input conditions. The knowledge base is represented by a collection of conditionaction statements. These statements are called rules and the inference procedure is known as the rule interpreter. The other major components of an expert system (discussed below) are a user interface, a database, and a rule structure. The primary difficulties are in obtaining an adequate collection of rules for the task and in defining an appropriate strategy for resolving inconsistencies or ambiguities when applying rules to a given problem. A specific robot task would be programmed by adding a set of rules, describing the particular operation to be formed, to an existing, and probably much larger, set of rules that defined the default and exception-handling behavior of the robot. John F. Jarvis Applications. Since the mid-1970s, expert systems have been continuously developed and applied by industry, business, and commerce. Expert systems are the most successful implementation of artificial intelligence. Applications include medical diagnosis, mineral analysis, control of complex processes, and fault diagnosis and monitoring; systems for fault diagnosis and monitoring account for about half the applications. See ARTIFICIAL INTELLIGENCE. Goal-directed systems. In many expert systems, the rule interpreter, or inference engine as it is sometimes called, uses an exhaustive backward-chaining strategy, in which the rules are expressed in a highly focused “IF-condition-THEN-result” form, the result is known or concluded, and, knowing the answer, the system searches backward for the rule whose condition produced the result. Such an expert system is termed goal-directed, since the search is, in fact, directed toward a goal (the known result). Goaldirected systems are therefore the result of going from the general to the particular. An inference mechanism based on a backwardchaining strategy is commonly equipped with an extensive explanation facility to reduce the amount of conflict resolution needed. To be user-friendly, an expert system must also possess a user interface that interrogates the human expert in a manner that is unambiguous. The information obtained from the user interface must then be stored as facts in a database, and the relationship between individual facts must be stated. The rule structure determines which rule should be examined next, in a forward-chaining expert system, or it further questions the user in a backward-chaining system. Finally, the representational structure must also include an uncertaintyhandling mechanism that is based on measurements of belief. See COMPUTER; DATABASE MANAGEMENT SYSTEM; FUZZY SETS AND SYSTEMS. While the use of rules having the form “IFcondition-THEN-result,” known as production rules, has been particularly successful when used in goaldirected applications such as diagnostics, such rules
have limitations. The major limitations associated with production rules are the so-called knowledge acquisition bottleneck and the difficulty of determining the correct method of representing the rules when there is a large number of them. In order to overcome these limitations, alternative representations have been attempted. Production rules have been represented in the form of inference nets, causal mechanisms, and frame-based abduction mechanisms. The inference net is overlaid by a semantic net to connect different objects. As with production rules, an inference-net approach can be backward-chained, but it requires a different mechanism for handling uncertainty. For inference nets, a mechanism based on Bayes’ rule, a statistical rule, has been used. However, if an expert system is constructed by using causal or frame-based abduction mechanisms it must have inference and control mechanisms suited to those representations. Also, the relational mechanism for storing facts in the database must alter accordingly. For causal nets the database is divided into findings and hypotheses. In frame-based abduction the database holds domain knowledge in the form of descriptive frames of information. Also, with frame-based abduction the inference mechanism uses hypotheses-and-test cycles. See ESTIMATION THEORY. Machine learning. The difficulties surrounding production rules, including knowledge acquisition, descriptive knowledge representation, and context handling, have led to the development of a subdivision of expert system technology known as machine learning, which is a method for automatically generating rules from examples. Thus, machine learning is an approach based on the idea of going from the particular to the general. Machine learning can be classified into three areas, artificial-intelligence–type learning, neural nets, and genetic algorithms. An effective machine-learning system must use sampled data to generate internal updates, be capable of explaining its findings in an understandable way (for example, symbolically), and communicate its explanation to a human expert in a manner that improves the human’s understanding. Expert systems based on artificial-intelligence learning have surpassed the performance of human experts, and they have been able to communicate what they have learned to human experts for the purpose of verification. There are two schools of artificial-intelligencetype learning. One is based on logic and attempts to find minimal representations of finite functions, while the other concentrates on learning via knowledge representation, acquisition, compaction, and learning from example. The theoretical base of machine learning is multivariate analysis in statistics, and the use of decision trees. Attribute-based learning systems use a common form of knowledge representation to learn explicit rules. Input attributes can be discrete, continuous, or tree-structured, or they can be example sets that contain attribute values. In example sets, each distinguished attribute is the
Intercalation compounds classification attribute for each example. The output is a decision rule that is capable of classifying further, unseen examples that are presented to the system. See DECISION THEORY; STATISTICS. Artificial-intelligence–type learning originated from an investigation into the possibility of using decision trees or production rules for concept representation. Subsequently the work was extended to use decision trees and production rules in order to handle the most conventional data types, including those with noisy data, and to function as a knowledge acquisition tool. To construct a rule from a training set, the algorithm must first examine all the examples in the training set to see if they are all of the same class. If this is the case, classification is achieved. If the training sets are not all of the same class, then the most significant attribute is selected from the training set, the set is partitioned into two sets, and subtrees are constructed. The evaluation function used for selecting an attribute is based on information theory, and recursive methods are used to construct the subtrees. Based on this approach, in 1978 the first expert system generated exclusively by inductive learning from example was used to diagnose 19 soybean diseases with a degree of accuracy higher than that of a human expert. Interest in neural-net (connectionist) learning algorithms stems from their ability to construct a network model based on training examples. When finally constructed, the neural network becomes the knowledge base for an expert system. Neural networks are capable of automatically generating behavior based on example sets of types described in the discussion of artificial-intelligence–type learning above. Their major limitation is that they are useful only in diagnostic systems that have fixed input and output sets. See NEURAL NETWORK. Genetic algorithms are based on iteration. They use general-purpose search strategies that have their foundations in genetics, in particular, in the process of natural selection. The purpose of iteration is to weed out weaker representations and give strengthened support to those that provide better results. Genetic algorithms attempt to balance exploration with exploitation, yet at the same time they try to avoid the problems associated with other goal-directed methods. For example, they attempt to avoid the local-minima problems found in hill climbing methods and the inefficiencies of random search. Since they make no prior assumption about the problem domain, they can be applied generally, and, in particular, in areas whose characteristics are unknown. They have been applied successfully in domains where optimization is a problem, but they started to be used for learning systems only in the late 1980s. The iterative procedure of a genetic algorithm maintains a set of knowledge structures, called a population. This structure is a string of symbols chosen from a fixed set of examples; this is a candidate solution. At each generation the structure is evaluated and is given a fitness value that is dependent on how
close it is to the required solution. While genetic algorithms and neural nets have contributed to the study of machine learning, a disadvantage of these methods is that learning is a slow process when they are used. See EXPERT SYSTEMS. Edward Grant Bibliography. D. A. Bradley et al., Mechatronics and the Design of Intelligent Mechanics and Systems, 2000; R. Kurzweil, Age of Intelligent Machines, 1992; M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot Modeling and Control, 2005; G. Zobrist and C. Y. Ho, Intelligent Systems and Robotics, 2000.
Intercalation compounds Crystalline or partially crystalline solids consisting of a host lattice containing voids into which guest atoms or molecules are inserted. Candidate hosts for intercalation reactions may be classified by the number of directions (0 to 3) along which the lattice is strongly bonded and thus unaffected by the intercalation reaction. Isotropic, three-dimensional lattices (including many oxides and zeolites) contain large voids that can accept multiple guest atoms or molecules. Layertype, two-dimensional lattices (graphite and clays) swell up perpendicular to the layers when the guest atoms enter (illus. a). The chains in one-dimensional structures (polymers such as polyacetylene) rotate cooperatively about their axes during the intercalation reaction to form channels that are occupied by the guest atoms (illus. b). In the intercalation family based on solid C60 (buckminsterfullerene), the zero-dimensional host lattice consists of 60-atom carbon clusters with strong internal bonding but weak intercluster bonding. These clusters pack together like hard 1-nm-diameter spheres, creating interstitial voids which are large enough to accept most elements in the periodic table (illus. c). The proportions of guest and host atoms may be varied continuously in many of these materials, which are therefore not true compounds. Many ternary and quaternary substances, containing two or three distinct guest species, are known. The guest may be an atom or inorganic molecule (such as an alkali metal, halogen, or metal halide), an organic molecule (for example, an aromatic such as benzene, pyridine, or ammonia), or both. See CRYSTAL STRUCTURE; FULLERENE; GRAPHITE; POLYMER; ZEOLITE. Competing interactions. The stability of a particular guest-host configuration depends on the balance between positive and negative contributions to the formation energy of the so-called compound. For example, in carbon-based hosts such as graphite, both guest and host become ionized after intercalation, and the attraction between oppositely charged guest and host ions helps to offset the elastic energy penalty associated with the swelling of the lattice. Some of these contributions vary with temperature, leading to structural phase transitions such as internal melting of the guest layers in graphite intercalation compounds. In some cases the guest layers
317
318
Intercalation compounds potassium
hydrogen
carbon
carbon potassium sodium
carbon (a)
(b)
(c)
Typical intercalation compounds. (a) Potassium in graphite, KC8, a prototype layer intercalate. (b) Sodium in polyacetylene, [Na0.13(CH)]x, where x denotes infinitely repeating polymer chains. (c) Potassium in solid buckminsterfullerene, K3C60. The internal structure of the fullerene molecule is not shown. ( After J. E. Fischer and P. Bernier, Les cristayx des fullerenes, ` La Recherche, 24(250):46 – 55, January 1993)
can be far enough apart to be considered independent, providing important prototypes for studying phenomena such as magnetism, superconductivity, and phase transitions in two dimensions. The spatial distribution of guest atoms or molecules within the host network can be regular or disordered, depending on temperature, concentration, and the chemical nature of the guest. See COHESION (PHYSICS); SOLIDSTATE CHEMISTRY. An example of temperature-dependent behavior is exhibited by alkali atoms inserted in graphite. If precisely half the number of alkali atoms that could be accommodated at the limit of close-packed alkali monolayers is added between each pair of graphite sheets, the system displays a phase transition. At low temperature, the elastic penalty will be minimized if the alkali ions are crowded into only half the galleries, resulting in a periodic dense-stage-2 configuration: maximally dense alkali monolayers between every other pair of host sheets. At high temperature the mixing entropy between filled and empty sites drives a transition to a site-disordered dilute-stage-1 configuration, in which all galleries contain some ions, on average 50% of the hard-sphere limit. The dense-stage-2 configuration may be recovered by applying hydrostatic pressure to the high-temperature configuration, because the volume is minimized by segregating the filled and empty sites into completely filled and empty galleries respectively. See ENTROPY; PHASE TRANSITIONS; THERMODYNAMIC PRINCIPLES. Properties. The physical properties of the host are often dramatically altered by intercalation. In carbonbased hosts such as graphite, polyacetylene, and solid C60, the charge donated to the host by the guest is delocalized in at least one direction, leading to large enhancements in electrical conductivity. Some
hosts can be intercalated by electron-donating or -accepting guests, leading respectively to n-type or p-type synthetic metals, analogous to doped semiconductors. The chemical bonding between guest and host can be exploited to fine tune the chemical reactivity of the guest. In many cases, superconductors can be created out of nonsuperconducting constituents. See SEMICONDUCTOR; SUPERCONDUCTIVITY. Applications. Many applications of intercalation compounds derive from the reversibility of the intercalation reaction. The best-known example is pottery: Water intercalated between the silicate sheets makes wet clay plastic, while driving the water out during firing results in a dense, hard, durable material. Many intercalation compounds are good ionic conductors and are thus useful as electrodes in batteries and fuel cells. A technology for lightweight rechargeable batteries employs lithium ions which shuttle back and forth between two different intercalation electrodes as the battery is charged and discharged: vanadium oxide (three-dimensional) and graphite (two-dimensional). Zeolites containing metal atoms remain sufficiently porous to serve as catalysts for gas-phase reactions. Many compounds can be used as convenient storage media, releasing the guest molecules in a controlled manner by mild heating. See BATTERY; CLAY, COMMERCIAL; FUEL CELL; SOLID-STATE BATTERY. J. E. Fischer Bibliography. P. Bernier et al. (eds.), Chemical Physics of Intercalation II, 1993; T. Enoki, M. Suzuki, and M. Endo, Graphite Intercalation Compounds and Applications, 2003; C. Julien, J. P. Pereira-Ramos, and A. Momchilov (eds.), New Trends in Intercalation Compounds for Energy Storage, 2002; A. P. Legrand and S. Flandrois (eds.), Chemical Physics of Intercalation, 1987.
Interface of phases
Intercommunicating systems (intercoms) A privately owned system that allows voice communication between a limited number of locations, usually within a relatively small area, such as a building, office, or residence. Intercommunicating systems are generally known as intercoms. Intercom systems can vary widely in complexity, features, and technology. Though limited in size and scope, intercom systems can provide easy and reliable communication for their users. Wire systems. An extremely simple intercom is a two-station arrangement in which one station is connected to the other via a dedicated wire. Simple twisted-pair cabling is the most common type of wiring for this type of intercom. An example in a business application is an executive-secretary system, in which the executive presses a button to signal the secretary and initiate the connection. When the secretary answers, the voice link is automatically established. Other systems have multiple stations, as many as 10 to 20, any of which can connect with any other station. The user must dial a one- or two-digit code to signal the intended destination. When the called party answers, the talk path is established between the two stations. In a residence, this type of system permits calling between extension telephones through abbreviated dialing codes; the called extension alerts the user through a distinctive ringing signal. Telephone intercoms. Still other intercom systems work in conjunction with key and hybrid key telephone/private branch exchange telephone systems. They support internal station-to-station calling rather than access to outside lines. Many intercoms possess some or all of the features of that particular telephone system, such as call hold, call pickup, and conferencing. See KEY TELEPHONE SYSTEM; PRIVATE BRANCH EXCHANGE. Normally the telephone intercom is incorporated in the same telephone instrument that is used to access the public switched network. Two primary types of intercom systems exist: the single-link and the multilink. The single-link system provides a single dedicated intercom talk path in the telephone system, no matter how many stations might utilize the intercom system. The multilink intercom system provides two or more dedicated intercom paths that allow more than one simultaneous intercom conversation to be in progress. See SWITCHING SYSTEMS (COMMUNICATIONS); TELEPHONE SERVICE. Wireless intercoms. A third type of intercom is the wireless intercom system for intrabuilding communications, which consists of a base unit radio transmitter, equipped with an antenna, and a number of roving units tuned to different frequencies. The base can selectively communicate with individual roving units by dialing the code corresponding to each roving unit’s specific frequency. Commercial systems usually have a range of 1–2 mi (2–3 km). Structural components of the building, such as steel or con-
crete, can interfere with transmission quality and limit effective range. See MOBILE COMMUNICATIONS. Byron W. Battles; Vincent F. Rafferty Bibliography. The Aries Group·MPSG, Glossary of Terminology: Key Telephone Systems and Hybrid KTS/PBX Systems, 1990; Center for Communications Management, Inc., Telecommunications Dictionary and Fact Book, 1984; G. Langley, Telephony’s Dictionary, 2d ed., 1986.
Interface of phases The boundary between any two phases. Among the three phases, gas, liquid, and solid, five types of interfaces are possible: gas-liquid, gas-solid, liquid-liquid, liquid-solid, and solid-solid. The abrupt transition from one phase to another at these boundaries, even though subject to the kinetic effects of molecular motion, is statistically a surface only one or two molecules thick. A unique property of the surfaces of the phases that adjoin at an interface is the surface energy which is the result of unbalanced molecular fields existing at the surfaces of the two phases. Within the bulk of a given phase, the intermolecular forces are uniform because each molecule enjoys a statistically homogeneous field produced by neighboring molecules of the same substance. Molecules in the surface of a phase, however, are bounded on one side by an entirely different environment, with the result that there are intermolecular forces that then tend to pull these surface molecules toward the bulk of the phase. A drop of water, as a result, tends to assume a spherical shape in order to reduce the surface area of the droplet to a minimum. Surface energy. At an interface, there will be a difference in the tendencies for each phase to attract its own molecules. Consequently, there is always a minimum in the free energy of the surfaces at an interface, the net amount of which is called the interfacial energy. At the water-air interface, for example, the difference in molecular fields in the water and air surfaces accounts for the interfacial energy of 7.2 × 10−6 joule/cm2 (11.15 × 10−6 cal/in.2) of interfacial surface. The interfacial energy between the two liquids, benzene and water, is 3.5 × 10−6 J/cm2 (5.4 × 10−6 cal/in.2), and between ethyl ether and mercury is 37.9 × 10−6 J/cm2 (58.67 × 10−6 cal/in.2). These interfacial energies are also expressed as surface tension in units of millinewtons per meter. The surface energy at an interface may be altered by the addition of solutes that migrate to the surface and modify the molecular forces there, or the surface energy may be changed by converting the planar interfacial boundary to a curved surface. Both the theoretical and practical implications of this change in surface energy are embodied in the Kelvin equation, Eq. (1) where P/P0 is the ratio of the vapor pressure of ln
P 2Mγ = P0 RTρr
(1)
a liquid droplet with diameter r to the vapor pressure
319
320
Interference filters γLG
γSG
θ
O
See FLOTATION; FOAM; FREE ENERGY; PHASE EQUILIBRIUM; SURFACE TENSION. Wendell H. Slabaugh Bibliography. A. W. Adamson and A. P. Gast, Physical Chemistry of Surfaces, 6th ed., 1997; H.-J. Butt, K. Graf, and M. Kappl, Physics and Chemistry of Interfaces, 2d ed., 2006.
gas
liquid
γSL solid
Contact angle θ at interface of three phases.
Interference filters of the pure liquid in bulk, ρ the density, γ the surface energy, and M the molecular weight. Thus, the smaller the droplet the greater the relative vapor pressure, and as a consequence, small droplets of liquid evaporate more rapidly than larger ones. The surface energy of solids is also a function of their size, and the Kelvin equation can be modified to describe the greater solubility of small particles compared to that of larger particles of the same solid. See ADSORPTION; SURFACTANT. Contact angle. At liquid-solid interfaces, where the confluence of the two phases is usually termed wetting, a critical factor called the contact angle is involved. A drop of water placed on a paraffin surface, for example, retains a globular shape, whereas the same drop of water placed on a clean glass surface spreads out into a thin layer. In the first instance, the contact angle is practically 180◦, and in the second instance, it is practically 0◦. The study of contact angles reveals the interplay of interfacial energies at three boundaries. The illustration is a schematic representation of the cross section of a drop of liquid on a solid. There are solid-liquid, solid-gas, and liquid-gas interfaces that meet in a linear zone at O. The forces about O that determine the equilibrium contact angle are related to each other according to Eq. (2), where the γ terms represent free energies at γSG = γSL + γLG cos θ
(2)
the interfaces and θ is the contact angle. Since only γ LG and θ can be measured readily, the term adhesion tension is defined by Eq. (3). Adhesion tension, γLG cos θ = γSG − γSL = adhesion tension
(3)
which is the free energy of setting, is of critical importance in detergency, dispersion of powders and pigments, lubrication, adhesion, and spreading processes. The measurement of interfacial energies is made directly only upon liquid-gas and liquid-liquid interfaces. In measuring the liquid-gas interfacial energy (surface tension), the methods of capillary rise, drop weight on pendant drop, bubble pressure, sessile drops, Du Nuoy ring, vibrating jets, and ultrasonic action are among those used. There is a small but appreciable temperature effect upon surface tension, and this property is used to determine small differences in the surface tension of a liquid by placing the two ends of a liquid column in a capillary tube whose two ends are at different temperatures. The determination of interfacial energies at other types of interfaces can be inferred only by indirect methods.
Optical filters that are based on the interference of optical radiation in thin films—the phenomenon responsible for the iridescent colors observed in soap bubbles, oil slicks, shells, bird feathers, and butterfly wings. An optical filter is any material or device that is used to change the intensity, spectral distribution, phase, or polarization properties of the transmitted or reflected components of the optical radiation incident upon it. Large and complicated instruments, such as interferometers and spectrophotometers, are not usually included within this definition. Some of the many other different physical principles that have been used in the past to produce optical filters are absorption, reflection, scattering, diffraction, and polarization. The term “coating” is commonly used as a synonym for “filter” and, historically, has been applied to thin-film filters of certain functionalities. See COLOR FILTER; INTERFERENCE OF WAVES. Filters and coatings. Filters and coatings based on thin-film interference are remarkably versatile. Although most are currently produced for the visible and adjacent parts of the spectrum, thin-film interference filters are also made for spectral regions extending from the x-ray to submillimeter wavelengths. Thus, the ratio of the highest to the lowest wavelengths for which optical interference coatings are made is of the order of 1:106. See ELECTROMAGNETIC RADIATION. In addition to their use in optical instruments for science and engineering, thin-film interference filters are used in eyeglasses and photographic cameras, energy conversion and conservation, architecture, communications, and display devices. New applications for optical interference filters appear regularly. One disadvantage of filters based on thin-film interference is that unless special measures are taken, their properties change with the angle of incidence of light. In some cases, even that can be turned into an advantage. For example, it is possible to tune the filters by tilting them, and the iridescent property of the thin-film coatings can be used for protection from counterfeiting as well as for decorative purposes. Thin-film thicknesses and materials. Optical thin films are mostly made of inorganic materials, including dielectrics (such as Al2O3, CaF2, HfO2, MgF2, PbF2, Sc2O3, SiO2, TiO2, Y2O3, ZnS, and ZrO), semiconductors (such as GaAs, Ge, and Si), metals (such as Ag, Al, Au, Cr, Ni, and Mo), and alloys (such as the Ni-Cr-Fe alloys, Inconel and Nichrome), although organic materials are increasingly being used. The thicknesses of the transparent films vary, but they are usually within one or two orders of magnitude of the wavelengths
Interference filters
100% θm
R
nm l . ..
nl
kl
dl
i . ..
ni
ki
di
2
n2
k2
d2
1
n1
k1
d1
θi
ns T Fig. 1. Schematic representation of a homogeneous multilayer coating.
for which the filter is to be effective. The absolute thickness will vary a great deal, depending on whether the filter is intended for the x-ray, visible, or submillimeter spectral regions. For a filter designed for the visible part of the spectrum, a typical thickness might be around 100 nanometers. The thicknesses of metal and alloy layers within a multilayer system have to be thin enough that the layers are substantially transmitting. Metal layer thicknesses are typically around 5 nm. Deposition processes. The most precise optical interference filters are currently produced by physical deposition processes, such as thermal or electronbeam evaporation with or without ion assist, by ion plating methods, by various forms of direct-current (dc), radio-frequency (rf), or alternating-current (ac) magnetron sputtering, or by ion-beam sputtering. All these methods require expensive vacuum equipment. Less precise coatings can be deposited by chemical vapor deposition or from a liquid phase. Equipment for these processes is usually considerably cheaper, and is used for depositing thin-film filters for consumer products whenever the accuracy is adequate. When choosing the deposition process, other considerations are whether it lends itself for the manufacture of large areas of coatings with uniform thickness, whether the properties of the resulting coatings are chemically stable and do not change with time, whether the produced coating’s electrical properties are damaged by the high power of laser systems, and whether the mechanical properties of the coatings will withstand the expected wear and tear. Meeting the nonoptical requirements is often more challenging than obtaining the required optical performance. See CHEMICAL VAPOR DEPOSITION; LASER; PHYSICAL VAPOR DEPOSITION; RADIATION DAMAGE TO MATERIALS; SPUTTERING. Theory. In its simplest manifestation, a coating can consist of a single layer deposited on a glass surface. Some of the light incident on this film will be reflected from the air–film interface, some from the
glass–film interface, and the remainder will be transmitted into the glass. Depending on the value of the refractive index of the coating material relative to that of the glass and the relative thickness of the film to the wavelength of the incident light, constructive or destructive interference will take place between the two reflected electromagnetic waves, and the reflectance of the glass–air interface onto which the layer has been deposited can be enhanced or reduced. For such a simple system, it is possible to write down explicit expressions for the resulting reflectance. But most applications require filters that consist not of one thin film but tens or hundreds of thin films. A schematic representation of an optical thin-film interference filter is shown in Fig. 1. The filter consists of l layers deposited onto a substrate of refractive index ns and immersed in a medium of refractive index nm. The parameters available to the thin-film designer to achieve the desired performance include ns, nm, and the refractive indices ni, as well as extinction coefficients ki, and thicknesses di of the l layers. Clearly, the more complicated the required spectral performance of a filter is, the greater the number of layers l and the higher the overall thickness of the multilayer coating that is needed to yield the required result. For accurate results during the design process, it is necessary to take into account not only the primary reflections from each interface but also the effects of multiply reflected beams from the various interfaces. The performance of such systems is best evaluated with the aid of computer programs that are based on Maxwell’s equations for the propagation of electromagnetic waves. Depending on the design method used, the solution to the problem may consist of a single, thick inhomogeneous layer in which the optical constants vary gradually as a function of distance from the substrate, or of a multilayer system consisting of many homogeneous layers of two or more different materials. The latter are more readily implemented. See ABSORPTION OF ELECTROMAGNETIC RADIATION; MAXWELL’S EQUATIONS; OPTICAL MATERIALS; REFLECTION OF ELECTROMAGNETIC RADIATION; REFRACTION OF WAVES. Optical interference filters and coatings. The following are examples of the most important types of optical interference filters and coatings. Antireflection coatings. Antireflection coatings are the first application of optical thin-film interference. Around 1817, Joseph Fraunhofer discovered that tarnished glasses reflected less light than freshly polished glasses. He then proceeded to develop a process for artificially aging glasses to achieve this effect. The results were the first antireflection coatings. To this day, antireflection coatings are one of the most important optical thin-film components in both scientific and consumer instrumentation. The reflection of light from the surface of a transparent substrate at normal incidence can be as little as 4% for glass and as much as 30% for germanium, a material that is used extensively in infrared (IR) optics. Thus, germanium optics without antireflection coatings would lead to huge light losses. In
321
Interference filters 5
silicon (R = 27%)
reflectance, %
4
germanium (R = 36%)
3 2
1 0 0.2
5
2 0.3
0.4
6
glass (R = 4%) 3
quartz (R = 3.7%)
1
0.5
7 Irtran II (R = 14%)
4 0.6
0.7
2.0 0.8 wavelength, µm
(a)
4.0
6.0
8 8.0
10.0
12.0 14.0
(b)
Fig. 2. Measured performance of antireflection coatings for quartz (curves 1, 2), glass (3–5), silicon (6), Irtran II [ZnS] (7), and germanium (8) surfaces in the (a) ultraviolet-visible and (b) infrared spectral regions. The numbers in brackets represent the reflectances of the surfaces without the antireflection coatings.
many instances, even glass surfaces must be antireflection-coated. For example, telephoto lenses for television and personal cameras can consist of many glass–air interfaces that must be antireflectioncoated if the lenses are to be useful. This not only increases the light throughput of the objectives but also prevents the formation of stray light though multiple reflections that would reduce the contrast in the image plane. Figure 2 shows the reflectances of substrate–air interfaces of various materials before and after treatment with antireflection coatings. Unless otherwise stated, the curves shown in Fig. 2 and subsequent diagrams are measured values. At the present state of thin-film deposition technology, the difference between theory and experiment is not very significant in most cases. Another special type of antireflection coatings, not shown in Fig. 2, is black layer coatings. Such coatings do not transmit or reflect any of the incident light in a given spectral region but absorb all the incident light. Such coatings find uses in high-contrast display devices and energy conversion coatings. High-reflection coatings. These coatings are used whenever surfaces are required to have a high reflectance.
At one time, metallic surfaces were the only way to achieve this goal; that is, metal substrates were polished until they formed mirrors. More recently, opaque metallic layers have been deposited onto polished glass surfaces to achieve the same effect. Some typical spectral reflectance curves of metals are shown in Fig. 3. They have broad regions of high reflectivity. However, when the highest reflectances are required, interference effects in stacks of nonabsorbing multilayers are used. In this way, reflectances as high as 99.9998% can be achieved in narrow parts of the visible and near-infrared spectral regions. Such values are required for some laser applications. Using special designs consisting of a larger number of layers of suitable thicknesses, high reflectances can also be achieved over broader spectral regions. For femtosecond lasers, reflectors are required whose phase dispersions must meet certain criteria. High-reflectance coatings are made for wavelengths ranging from the x-ray to the submillimeter spectral regions, although the reflectances that can be achieved depend greatly on the optical constants of the coating materials available in a given region. The performances of some typical high-reflectance
100 3 80 reflectance
322
60
Al + 14 nm LiF
Al
Ag
4
5
6
1
40
Os 2
20 0 0.0
0.04
0.1
0.4 wavelength, µm
1.0
4.0
10.0
Fig. 3. Reflecting coatings. The solid curves represent typical measured reflectances of multilayer coatings for the x-ray (1, 2), ultraviolet (3), visible (4), and infrared (5, 6) spectral regions. The reflectances of opaque osmium, aluminum, and silver films (broken-line curves) are shown for comparison. The reflectances of metal layers can be extended or enhanced by additional thin-film coatings (for example, opaque aluminum with a 14-nanometer-thick LiF layer).
Interference filters 100 1
2
reflectance
80
4
3
5
60 40 8
7
6
9
20 0
0.5
2.0 wavelength, µm
1.0
20.0
10.0
5.0
Fig. 4. Measured performances of short-wavelength (curves 1–5) and long-wavelength (6–9) cut-off filters for the ultraviolet, visible, and infrared spectral regions.
coatings for different wavelengths are shown in Fig. 3. Long- and short-wavelength cut-off filters. These cut-off filters have an important technological application. For the visible and adjoining spectral regions, short-wavelength cut-off filters can be more easily produced from colored glasses. However, longwavelength cut-off filters are difficult to produce except by thin-film interference. Even with this technique, the spectral ranges over which the transmittance remains high and the rejection low leave much to be desired. The performance of typical thin-film interference short- and long-wavelength cut-off filters is shown in Fig. 4. Achromatic and color-selective beam splitters. Beam splitters are used in many scientific instruments, as well as in some consumer products. Suitable thin-film systems are deposited onto a plate or are located between two right-angled prisms. They split the obliquely incident light beam into two beams, one of which is transmitted and the other reflected. Usually when light is incident onto a multilayer system at nonnormal incidence, the transmittances and reflectances will be different for incident light that is polarized parallel (Tp, Rp) and perpendicular
(Ts, Rs) to the plane of incidence. For beam splitters, this is a disadvantage and measures are often taken to reduce the effect. When the two beams have a uniform energy distribution in the spectral region of interest, the beam splitters are said to be achromatic. The beams may be of equal intensity, or one beam may be more intense than the other. Examples of the performance of such coatings are shown in Fig. 5. At other times, beam splitters are required in which the incident light is divided into beams of different color. Such beam splitters are called colorselective, and they are used in cameras and projection equipment for television and cinematography. The measured mean transmittances (Tp + Ts)/2 of one set of color selective beam splitters are shown in Fig. 5c. Narrow-band-transmission interference filters. These are used in many scientific and engineering applications. In general, the specifications for narrow-band filters not only include the center wavelength and width at the points where transmittance is half of the maximum transmittance and the width of the rejection region where the reflectance is low, but also the widths at 0.1 and 0.01 of the maximum transmittance. The relative values of these widths are a measure of the
transmittance, reflectance
100 80
Tp
Rs
T1
Rs
Ts
T2 T3
T4
60 40 20
Rp
Tp Rp
45°
Ts
45°
45°
0 0.4
(a)
0.5
0.6
0.7
0.4
(b)
0.5 0.6 wavelength, µm
0.7
0.4
0.5
0.6
0.7
(c)
Fig. 5. Measured performance of (a) achromatic plate, (b) cube, (c) and color-selective beam splitters. In each case, the light ◦ was incident at 45 to the multilayer. The transmittance and reflectance may exhibit a strong dependence on the polarization of the incident light, as in a and b.
323
Interference filters 102
100
1
transmittance, %
10
100
60
10−1 40
10−2
10−4
4 1%
2 25%
80
3 5%
1 5%
6 5%
7 25%
8 9 5% 10%
10 25% 11 10%
5 0.01%
20
10−3 0.95
(a)
1.00 1.05 1.09 wavelength, µm
0
0.3
(b)
0.5
1.0
3.0 wavelength, mm
5.0
10.0
Fig. 6. Narrow-band transmission filters. (a) Calculated transmittance curves of filters having progressively squarer transmission bands. (b) Measured transmittances of a number of typical narrow-band interference filters for the ultraviolet, visible, and infrared spectral regions. The widths of the filters at half-maximum transmission, expressed as a percentage of the center wavelength, are indicated. All these filters do not transmit at shorter and longer wavelengths, except for the narrowest filter (curve 5), which requires additional blocking.
squareness of the transmission band (Fig. 6a). The specifications are most stringent for filters for telecom applications, where the transmission in the rejection region may have to be as small as 10−5 and the transmission in the pass band may have to be in excess of 98%. To meet such stringent requirements, filters are required that often consist of more than 150 layers. A few representative narrow-band filters are shown in Fig. 6b. Narrow-band filters can be constructed with half-widths as narrow as 1 nanometer or as wide as 25% of the center wavelength. Narrow-band rejection filters. Sometimes called minus filters, these filters should transmit all the incident radiation except for one or more specified wavelengths. Such filters are used in safety goggles for scientists and technologists working with lasers or for rejecting specific spectral lines in spectroscopic laboratories. Narrow-band rejection filters can consist of many layers made of two materials of close refractive indices, or they can consist of an inhomogeneous layer in which the refractive index oscillates gradually between two close values. Such coatings, often called rugate filters, have the advantage that the transmittance stays high over a broader
range of wavelengths on the short-wavelength side of the rejection band. Examples of the performance of narrow-band rejection filters are shown in Fig. 7. Polarizers and polarizing beam splitters. These find application in scientific instruments, as well as in liquidcrystal projection displays. Polarizers allow only light of one polarization to pass and block the light of the orthogonal polarization, while polarizing beam splitters separate unpolarized light into two orthogonally polarized transmitted and reflected beams of light. The purity of the polarized beam is measured by the extinction ratio, for example Tp/Ts or Rs/Rp, depending on whether the polarizer acts in transmission or reflection. Depending on the application, the polarizing devices might need to be effective over a narrow or broad spectral region. Most practical polarizers and polarizing beam splitters are based on isotropic interference multilayer coatings and need to operate with light incident at a specified oblique angle θ onto the multilayer. Most thin-film polarizers and polarizing beam splitters are effective only for light that is incident within a small angular field δθ air around θ, or operate only in a narrow spectral
100 80 transmittance
324
1
60 40
2 20 4
3
5
0 0.3
0.5
0.4
0.5
0.6
0.7 0.8 0.9 1.0 wavelength, µm
2.0
Fig. 7. Measured performance of narrow-band rejection filters based on homogeneous multilayers (curves 1–3) and inhomogeneous layers (4, 5).
Interference filters 102
100 Tp
Ts
Tp
100
Ts
80
60
θ = 56.5°
Ts
δθair = ±2°
θ = 45° δθair = ±1.7°
Tp /Ts > 105
10−2
θ = 70.5° δθair = ±11.4°
10−4 10
Ts /Tp > 10
Tp /Ts > 105
10−6 Tp
10−8 10−10
50
102
100 Rs
Rp
Rs
reflectance (R)
100 10−2
Ts
Rs
Rs
10−4
56.5°
Rp
70 Tp Rs /Rp >
Tp
102
0.55
10−6
45°
0.6
0.4
0.5
(b)
Rs /Rp > 10 0.6 0.7 wavelength, mm
Rp / Rs > 105 0.4
0.5
(c)
Fig. 8. Calculated performance of polarizers and polarizing beam splitters at the design angle θ. (a) Plate polarizers operate in a narrow spectral region with well-collimated incident light and can withstand high laser power. (b) MacNeille polarizing beam splitters operate over a broad spectral range, but their performances deteriorate rapidly with the departure of the incident beam from the design angle. (c) The Li-Li polarizing beam splitter operates over a broad spectral range and has a wide angular field for both the transmitted and reflected beams.
region. An exception is the Li-Li polarizing beam splitter, which is not yet (mid-2006) available commercially. The calculated spectral transmittances Tp, Ts and reflectances Rp, Rs of three types of polarizers and polarizing beam splitters at the design angles of incidence are shown in Fig. 8. Also given in the diagrams are the approximate extinction ratios for the transmitted and reflected beams at the edges of their angular fields measured in air. See POLARIZATION OF WAVES; POLARIZED LIGHT. Correction and gain-flattening filters. Correction and gainflattening filters are used when the spectral energy distribution of the light source or sensitivity of a detector vary too much for the satisfactory operation of a spectral or telecommunications device. The spectral transmittance of a correction or gain-flattening filter is such that in combination with the source and the detector, it will produce a more uniform response in the spectral region of interest. Correction filters often need to have very irregular spectral transmittances with values varying by orders of magnitude, especially when the light sources used have sharp emission lines. State of the art. The thin-film interference filters and coatings whose properties are shown in Figs. 2–8 are only a few examples of what can be produced, since filters with properties intermediate to those in the graphs are possible. However, the performance that can be achieved depends also on
the spectral region. At the outer edges of the spectrum for which interference filters are designed, the performance can be much worse than that in the visible and near infrared because of the lack of suitable coating materials. The state of the art of thin-film optical interference filters has made great strides since the 1950s. It is now possible to design filters with almost any spectral transmittance or reflectance characteristics for normally incident light, provided that the spectral region over which they are specified is not too wide. Deposition processes exist now for the manufacture of very stable, mechanically durable coatings. Process controls have been developed that make it possible to produce quite complex multilayer coatings automatically. The accuracy of experimentally produced thin-film interference filters is limited by the accuracy of the measurements made during their deposition. In the future, the use of nanostructured films, with properties that supplement those found in natural materials, should lead to filters with improved properties. J. A. Dobrowolski Bibliography. P. W. Baumeister, Optical Coating Technology, 2004; J. A. Dobrowolski, Optical properties of films and coatings, in Handbook of Optics, vol. 1, 2d ed., ed. by M. Bass, pp. 42.1–42.130, 1995; H. A. Macleod, Thin-film Optical Filters, 3d ed., 2001; J. D. Rancourt, Optical Thin Films Users’ Handbook, 1987.
10−8
Rs 0.6
0.7
10−10
reflectance (R)
Rp 80
60
(a)
Rp
70°
90
50 0.5
transmittance (T)
transmittance (T )
90
70
325
326
Interference-free circuitry
Interference-free circuitry Circuitry that is designed to ensure that external sources of interference do not degrade its performance. The design principles involved are very simple, and their application in a systematic manner will reduce interference from external sources to an acceptable level, even for the most sensitive circuits and applications. Electrical currents flow around complete, closed paths. These paths contain the energy sources which generate the currents. In complicated circuit networks comprising many meshes, a current from a given source may well divide and subdivide as it encounters circuit junction points, but these partial currents must recombine to give the original total current on return to a source. Unfortunately, most diagnostic equipment (such as oscilloscopes, voltmeters, and spectrum analyzers) responds to voltage differences, and the currents which cause these must be inferred. Nevertheless, attention must be concentrated on the currents flowing around circuits rather than voltages observed between points. Coupling modes. Consider the simplest possible circuit containing an impedance and a voltage or current detector, as illustrated on the left of Figs. 1, 2, and 3. Suppose it is close to another circuit consisting of a source of interference and a load impedance, as on the right of these diagrams. There are only three ways in which interference from the source can couple in to the left-hand circuit and be registered as an unwanted signal by the output device D. 1. In Fig. 1a, the two circuits have a portion of conductor A-B which is common to both. The current I in the right-hand circuit, which is the source of the interference, flows through this shared conductor, and this results in a voltage difference between A and B. This voltage in turn causes an unwanted interfering current to flow around the lefthand circuit, and to be registered by D. We could term this mode of coupling interference “commonconductor.” It will be effective over a wide range of interfering frequencies and for both high and low values of the impedances in both circuits. Supposing the circuits must be connected at all, the measure to take to prevent interference coupling-in via this mode is to reduce the length of common conductor to a single point, as in Fig. 1b. If the circuits have to be spatially separated, a single conductor can be connected between the two circuits, as in Fig. 1c; and it is obviously still true that however large the current in the interfering circuit, no voltage difference or consequent interference current is produced in the left-hand circuit. 2. In Fig. 2a, there is a connection between the circuits, and a stray capacitance C is shown between them. In Fig. 2b, there is no connection, but two stray capacitances are shown. Currents can circulate via capacitances to cause interference in the left-hand circuit. This capacitive coupling mode is usually significant at higher frequencies for circuits containing high impedances, say of the order of 100 or more. This coupling mode can be eliminated by strategically placed and connected conducting screens
D A
I
B Z2
(a)
Z1
D
I
(b)
Z2
Z1
D
I
(c)
Z2
Z1
Fig. 1. Common-conductor coupling of two circuits. (a) Two circuits sharing a common conductor AB. Z represents load impedance, D is a detector, ∼ represents a source of interference, and I is current. Coupling is prevented by (b) reducing the common conductor to a point or (c) connecting the circuits with a length of conductor.
between the circuits, as shown in Fig. 2c. 3. In Fig. 3a, the circuits have no common connection, but some of the magnetic flux B created by the interfering circuit threads the left-hand circuit and induces an interfering emf in it and a consequent interference current around it. This mode of interference is most significant where the left-hand circuit is of low impedance, that is, of the order of 10,000 or less. The way to avoid this magnetic coupling mode is to reduce the coupling between the circuits by either increasing the distance between them or decreasing to a very small value the area enclosed by one or, preferably, both circuits. Figure 3b, c, and d illustrates three ways of accomplishing the latter. In Fig. 3b, the conductors of both circuits are routed
Interference-free circuitry close to each other in go-and-return pairs. In Fig. 3c, the area enclosed by the go-and-return pairs is further reduced by being twisted, with the advantage that the small flux generated is in opposing directions from successive loops of the twist and thereby cancels at a little distance from the circuit to an exceedingly small value. In Fig. 3d, the go-and-return conductors are arranged to be the inner and outer conductors of a coaxial cable. The line integral of
D
B I
(a) D
C
D
(b) D
(a)
Z2
Z1
D
(c) D
I I
(b)
Z2
Z1
Z2
(d) Fig. 3. Magnetic coupling of two circuits. (a) The magnetic field B created by the interference current I threads the left-hand circuit. (b) Decreasing the loop area of both circuits reduces magnetic coupling. (c) Twisting the conductors further reduces coupling. (d) An alternative solution employs coaxial connecting cables.
D
(c)
B=O
Z1
Fig. 2. Capacitive coupling of two circuits. (a) An interference current circulating via a stray capacitance C. (b) An interference current circulating via two stray capacitances. (c) Elimination of coupling by adding a screen to intercept capacitive currents.
magnetic flux around a concentric circular path of any radius outside the cable is zero because the total current threading it is zero. Because of the crosssectional symmetry of a circular coaxial cable, this line integral can be zero only if the flux itself is zero everywhere. Although this result is strictly true only for a straight cable of infinite length, flux elimination is good enough for all practical purposes for short lengths of cable if the cable is not too sharply bent anywhere. A further advantage of the coaxial solution is that if, as is usual, the outer conductors of the sending and receiving circuits are at more or less the same potential and involve no appreciable impedance, they act as screens, and capacitive
327
328
Interference-free circuitry B
A
D
O
C
(a) D C
B
A
O (b) Fig. 4. Vector diagrams illustrating an unexpected result. (a) Summing vectors OA, AB, BC, and CD which represent interference at a single frequency provides the resultant OD. (b) The resultant OD is increased when the interference represented by BC is eliminated.
coupling is also reduced to negligible proportions. If, in an actual situation, the outer conductors are themselves interconnected as a network, at lower frequencies additional measures may need to be taken to ensure that the currents in the inner and outer conductors of each cable are equal and opposite. One way of doing this is to thread one cable in each mesh several times through a high-permeability magnetic core to ensure that the total impedance of the mesh is much greater than the go-and-return impedance. The latter is unaffected by the presence of the core. See COAXIAL CABLE; COUPLED CIRCUITS. Need for systematic approach. In any actual situation, more than one of the three modes will be acting simultaneously, and in a complicated network there will be several coupling routes associated with each mode. Therefore a systematic approach is needed whereby each and every possible route for the ingress of interference is considered in turn and appropriate action taken. Unfortunately, eliminating interference systematically in the common situation where interference at a single frequency enters circuitry through several routes or through more than one mode can cause puzzling results. An example of single-frequency coherent interference is that from mains or line frequency and its harmonics. In this case there may well be partial cancellation at the final output device of the interference entering via the various routes. This situation is illustrated in the vector diagram of Fig. 4a, where the vector representing the sum of the interference is OD. Then if one of the routes
of entry is eliminated, say that corresponding to the vector BC, the remaining interference may well sum to larger value, as in Fig. 4b. This will make a trialand-error approach impossible, and will force a more considered and systematic approach to be taken. Sometimes, coherent interference of this kind may well accidentally sum to an acceptably small value, or another route of adjustable phase and magnitude may be deliberately created to oppose the total interference. This situation is not a good one, because any change in the circumstances or environment of the circuitry will alter the magnitude and phase of the various coupling modes and the interference will reappear. Besides, interference in one part of sensitive circuitry may cause it to become saturated and to malfunction. Therefore the aim should be reduction to acceptable levels of interference via all possible routes. In the example of mains/line interference, this will have the added advantage that any other interference carried and propagated via mains/line circuitry will also be eliminated. Isolation. Isolation is often an important concept in eliminating interference. An instrument or section of circuitry is said to be isolated if there are no routes by which significant currents, interfering or otherwise, can flow through the instrument or circuitry from a source onward to other circuitry. Circuitry entirely contained within a conduction enclosure apart from go-and-return current paths to supply power to it will be isolated. This description applies to many test and measurement instruments, even when connected to circuitry which is itself isolated, such as a resistance-measuring instrument connected to a resistance thermometer whose leads and sensing element are enclosed in a continuous shield. This shield should be connected only to the shield of the instrument, thus forming an extension of it. Unfortunately, isolation is usually compromised by the mains/line and power supply connections if the instrument or circuitry is connected into a larger, overall system of other instruments and circuits. One way of maintaining isolation in these circumstances is to make use of differential voltage-sensing input circuits which exhibit high input impedance. Battery-operated instruments whose batteries are contained entirely within the conducting case of the instrument can also offer a way out of some difficult situations. See ELECTRICAL SHIELDING. Misconceptions. Finally, clear thinking about interference elimination will be aided if two common misconceptions are recognized and avoided. 1. Interference does not enter circuitry at localized points or circuit nodes. A complete current path must be involved. 2. For brevity in circuit descriptions and diagrams, many connections are said to be made to a conductor called ground. The term is best avoided, but if old habits die hard, at least it should be recognized that this name is shorthand for a whole network of conductors which return currents to their sources, thereby completing circuit loops for interference coupling modes. This network may well not be an equipotential conductor system because of the currents flowing in it, and may itself give
Interference microscope rise to common-conductor coupling, which can be avoided by the star or single-conductor connections described above. See ELECTRICAL INTERFERENCE; GROUNDING; ELECTROMAGNETIC COMPATIBILITY. Bryan P. Kibble Bibliography. A Guide To Measuring Direct and Alternating Current and Voltage Below 1 MHz, Institute of Measurement and Control, U.K., 2003; P. Horowitz and W. Hill, The Art of Electronics, 2d ed., Cambridge University Press, 1989; R. Morrison, Grounding and Shielding Techniques in Instrumentation, 3d ed., Wiley-Interscience, 1986.
Interference microscope An instrument for visualizing and measuring differences in the phase of light transmitted through or reflected from microscopic specimens. It is closely allied to the phase-contrast microscope. See PHASECONTRAST MICROSCOPE. Simple theory. A microscopic image is formed by the interaction of all light waves passing through the optical system. In a phase-contrast microscope, light diffracted by a transparent object is spatially separated from nondiffracted light in the back focal plane of the objective, where a phase plate alters the relative phases of the diffracted and nondiffracted beams so that they interfere in the image plane to produce a visible, intensity-modulated image. In an interference microscope, the diffracted and nondiffracted waves are not spatially separated, but light (the object beam) that passes through or is reflected from the object interferes with light (the reference beam) that passes through or is reflected from a different region of the specimen plane or is reflected from a comparison (reference) surface. For interference to be visible, the light beams must be coherent; in other words, the beams must maintain a constant relationship of wavelength, phase, and polarization over a relatively long period. The easiest way to achieve coherence is by using a device such as a semireflecting mirror, which splits a light beam into two beams. Random changes in the properties of successive photons from a given point in the source then affect both beams simultaneously. For a Mach-Zehnder type of interferometer (Fig. 1), light from a source is split into two beams, which after total reflection at the mirrors are recombined. If the optical paths are identical, the light waves arrive in phase at the point where an image is formed, and interfere constructively. A retarding specimen in one beam throws the beams out of phase and produces partial or complete destructive interference where the image is formed. Differences in optical path introduced by various parts of the object can be seen as variations in intensity or color. It may be noted that energy is conserved: light “missing” from the image appears at a right angle to it. The optical path difference introduced by the specimen expresses a relationship between the geometric thickness, and the refractive indices of the reference space and the specimen. The optical path difference can be measured by a calibrated compensator. It restores the relative phase of
329
the two beams. See INTERFERENCE OF WAVES. Many forms of interferometer have been described, but few have been used in commercial microscopes. A Leitz interference microscope using the Mach-Zehnder principle had two condensers, two objectives, and wide separation of the specimen and reference beams. Maintenance of the necessary mechanical rigidity was difficult and expensive, however, and so several compact, single-objective polarizing microinterferometers have been devised based on the Jamin-Lebedeff principle. In this type of instrument, plane-polarized light from a substage polarizer is split by a birefringent plate cemented to the condenser into two parallel beams polarized at right angles to each other and separated, or sheared, by a distance of about 30–200 micrometers, depending on the magnification of the objective used. After passing through the specimen plane, the beams are recombined by a birefringent plate cemented to the objective. The relative phase of the beams can be adjusted with a polarizing compensator such as a quartz wedge or a compensator that combines a fixed quarter-wave plate and a rotating analyzer. See BIREFRINGENCE. Because of the beam splitters, an interference
image
E
beam-splitter G
mirror C
D
objective lens
retarding object
compensator
reference beam condenser lens object beam
mirror F
beam-splitter B
light source A Fig. 1. Diagram of a Mach-Zehnder type of interference microscope (eyepiece not shown). Light from source A is split at B into coherent object and reference beams, which are reflected at F and C, respectively, and recombined at D. If the optical paths are identical, constructive interference occurs at E and complete destructive interference occurs at G. That relationship, if disturbed by a retarding specimen in the object beam, can be restored by a compensator in the reference beam. (After D. J. Goldstein, Scanning microinterferometry, in G. A. Meek and H. Y. Elder, eds., Analytical and Quantitative Methods in Microscopy, Cambridge University Press, 1977)
330
Interference microscope microscope displays two separate images of the specimen. In some instruments, one of these, the socalled ghost image, is astigmatic and out of focus. If the ordinary and ghost images overlap, as can occur with closely packed or large specimens, reliable measurements or observations in the overlapping region are impossible. Measurement of phase changes. To measure the retardation of a relatively thick specimen, a gradient of retardation can be superimposed on the microscopic field by tilting an optical element or by inserting a retarding wedge in one beam. This produces a series of dark fringes across the field, uniformly spaced at intervals of one wavelength retardation. The deviation of fringes crossing a specimen is proportional to its retardation; whole wavelengths are readily counted, and fractions of a wavelength can be estimated with a precision of about 0.1 wavelength. To measure the retardation more precisely, a calibrated compensator (Fig. 1) is adjusted so that the background is maximally dark by using monochromatic light or some readily identifiable color such as sensitive purple. The compensator setting is recorded and then readjusted until the specimen appears with the same intensity or color as the background had. The difference between the two settings indicates the specimen retardation. The precision of visual setting can be improved appreciably by a half-shade device in the eyepiece, and under ideal circumstances is about 0.01 wavelength. With automatic microinterferometers that use mechanically phase-modulated light and a photoelectric detector, the reproducibility of repeated measurements of optical path difference exceeds 0.001 wavelength. The methods so far described measure the retardation at an arbitrary point in the object plane. In cell biology, however, it is often more useful to measure the integrated optical path difference of a discrete, nonhomogeneous specimen, which is the product of the specimen area and the mean optical path difference. The integrated optical path difference can be estimated by measuring the specimen area and by taking many measurements of local optical path differences or—more rapidly and precisely—by using a microinterferometer in its scanning and integrating mode. Differential interference–contrast microscope. In Mach-Zehnder and Jamin-Lebedeff interference microscopes, the ordinary and ghost images of a discrete object are relatively widely separated, or sheared, and the specimen is compared with a clear region of background. In differential interference– contrast (DIC) instruments, on the other hand, the shear is very small and the images overlap almost completely; a given point of the specimen is compared not with empty mounting medium but with an adjacent object point, and contrast depends on the gradient of optical path in the direction of shear (Fig. 2). Consider, for example, a uniform sphere mounted in a medium of different refractive index. In the center of the image the optical path gradient is zero and the intensity is identical with the background. At one edge the gradient is maximal and the image is brighter than the background,
analyzer Wollaston prism at back focal plane of objective objective specimen condenser Wollaston prism at front focal plane of condenser polarizer
Fig. 2. Principle of a differential interference microscope with Wollaston prisms at the focal planes of the condenser and objective. Nomarski devised modified Wollaston prisms that can be used if the focal planes are not physically accessible. (After W. Lang, Nomarski differential interference–contrast microscopy, Zeiss Inform., no. 70, pp. 114–120, 1969)
whereas at the other edge the gradient is opposite in sign and the image is darker than the background. The object appears almost as if shadow-cast. With a given objective and a large condenser aperture, differential interference–contrast gives good lateral resolution and a small depth of focus; as a result, a degree of optical sectioning through the specimen can be achieved. Reducing the condenser aperture increases the contrast somewhat, but differential interference–contrast is less sensitive in general than phase contrast for detecting small phase differences. Note that in transmitted-light differential interference contrast, the contrast is related not to the gradient in geometrical object thickness but to the gradient in optical path, and so the image does not necessarily indicate true surface structure. For example, an internal cell organelle such as a nucleus, with a refractive index higher than that of the cytoplasm, may incorrectly seem to project from or to be an indentation in the surface of the cell. In spite of those interpretation problems, the striking threedimensional appearance of differential interference contrast images has made the method popular in the field of biology. Reflecting specimens. Surface structure can be studied with the Linnik microscope. If its mirror is tilted slightly relative to the reflecting surface, interference fringes appear and irregularities in the reflecting surface are seen as displacements in the fringe system. If the two surfaces are exactly parallel, the fringe will be of infinite width so that irregularities in the reflecting surface appear as intensity variations against a uniformly illuminated background, as in phase contrast. Nomarski differential interference contrast is also eminently suitable for
Interference of waves reflected-light work. Excellent results have been obtained relatively inexpensively by using very narrow interference fringes produced by multiple reflection with an ordinary incident illuminator and a partially reflecting cover glass over the specimen. Nomarski differential interference contrast is eminently suitable for reflected-light work. Also, there are phase-stepping or phase-shifting interference microscopes with which highly reproducible and accurate measurements of a surface can be made automatically and rapidly. This type of instrument typically uses a Mirau interference objective to illuminate the specimen with incident light, and a charge-coupleddevice camera to capture an image with superimposed interference fringes. Intensities in the image are measured automatically, and the data are transferred to a computer. A small phase difference is introduced between the object and reference beams by moving either the reference surface or the specimen with a piezoelectric transducer controlled by the computer; this alters the fringes, and the intensities are again recorded. Provided at least three sets of intensity measurement are available, it is possible to compute the phase change at, and hence the height of, each image point. Some confocal scanning microscopes are able to function as microinterferometers, with the characteristic benefits of improved lateral resolution and optical sectioning ability given by confocal scanning. See CONFOCAL MICROSCOPY. Comparison with phase-contrast microscope. The phase-contrast microscope can be regarded as an imperfect interference microscope in which the necessarily finite width of the phase annulus produces incomplete separation of the two interfering beams. The resulting halo effect and accentuation of edges are absent in some types of interference microscope. Although eliminating optical artifacts reduces the risk of incorrect image interpretation, it is not necessarily advantageous for purely observational work. Ghost images may hamper considerably the use of the interference microscope to view extended or densely packed specimens. In addition, the relative sensitivity, low cost, and ease of adjustment of the phase-contrast microscope make it the instrument of choice for routine observations and for measurement of refractive indices by the immersion method. The interference microscope is, however, essential for measuring phase change. Biological applications. Knowing the thickness of the section being examined is often valuable in stereological work. Plastic sections more than 1 µm in thickness can readily be measured with a visual microinterferometer, and with an automatic instrument even ultrathin sections (about 50–70 nanometers) for electron microscopy can be measured with a reproducibility better than 5%. The primary biological use of microinterferometry, however, is probably the determination of cellular dry mass. With an automatic scanning and integrating microinterferometer, the dry mass of a living mammalian cell can be precisely and nondestructively determined in a few seconds. Because mea-
sured optical path difference depends on the product of thickness and refractive index difference, only one of those quantities need be known to calculate the other. Thus the combination of interference microscopy and immersion microrefractometry is capable of yielding much useful quantitative information. In some cases it has proved possible to determine cellular volume, thickness, concentration of solids and water, and total dry and wet mass. Interference microscopy has also been successfully used in quantitative histochemistry for studying changes in dry mass produced by enzymic digestion or specific extraction procedures. Nonbiological procedures. Like the phase-contrast microscope, the transmission interference microscope can be applied to the qualitative study of transparent specimens such as fibers and crystals. Incident-light instruments are employed in metallurgy, engineering, and the electronic industry to study specimens such as metallic surfaces, metalized replicas of surfaces, and electronic circuits. In that type of work it is a common practice to operate the instrument with fringes in the field of view, the shape of the fringes giving a contour map of the surface. The sensitivity is such that irregularities of the order of 0.01 wavelength or even less can be observed. The method gives valuable information about the quality of ground, polished, or etched surfaces. The differential (Nomarski) type of interference microscope has proved particularly useful for studying the surface structure of reflecting objects, since with that type of specimen it gives a true relief image uncomplicated by variations in refractive index. Vertical resolution on the order of 0.1 nm has been obtained with incident-light differential interference contrast microscopes combined with electronic image-analysis devices. See DIFFRACTION; OPTICAL MICROSCOPE; POLARIZED LIGHT; POLARIZED LIGHT MICROSCOPE; RESOLVING POWER (OPTICS). D. J. Goldstein Bibliography. M. Francon and S. Mallick, Polarization Interferometers: Applications in Microscopy and Macroscopy, 1971; W. Lang, Nomarski differential interference–contrast microscopy, Zeiss Inform., nos. 70 and 71, 1969; G. A. Meek and H. Y. Elder (eds.), Analytical and Quantitative Methods in Microscopy, 1977; M. Pluta, Advanced Light Microscopy, vol. 3, 1993; S. Tolansky, Multiple-Beam Interference Microscopy of Metals, 1970.
Interference of waves The process whereby two or more waves of the same frequency or wavelength combine to form a wave whose amplitude is the sum of the amplitudes of the interfering waves. The interfering waves can be electromagnetic, acoustic, or water waves, or in fact any periodic disturbance. The most striking feature of interference is the effect of adding two waves in which the trough of one wave coincides with the peak of another. If the two waves are of equal amplitude, they can cancel each other out so that the resulting amplitude is zero. This
331
332
Interference of waves is perhaps most dramatic in sound waves; it is possible to generate acoustic waves to arrive at a person’s ear so as to cancel out disturbing noise. In optics, this cancellation can occur for particular wavelengths in a situation where white light is a source. The resulting light will appear colored. This gives rise to the iridescent colors of beetles’ wings and mother-of-pearl, where the substances involved are actually colorless or transparent. Two-beam interference. The quantitative features of the phenomenon can be demonstrated most easily by considering two interfering waves. The amplitude of the first wave at a particular point in space can be written as Eq. (1), where A0 is the peak amA = A0 sin (ωt + ϕ1 )
(1)
plitude, and ω is 2π times the frequency. For the second wave Eq.(2) holds, where ϕ 1 − ϕ 2 is the phase B = B0 sin (ωt + ϕ2 )
(2)
difference between the two waves. In interference, the two waves are superimposed, and the resulting wave can be written as Eq. (3). Equation (3) can be expanded to give Eq. (4). A + B = A0 sin(ωt + ϕ1 ) + B0 sin (ωt + ϕ2 )
(3)
A + B = (A0 sin ϕ1 + B0 sin ϕ2 ) cos ωt + (A0 cos ϕ1 + B0 cos ϕ2 ) sin ωt
(4)
By writing Eqs. (5) and (6), Eq. (4) becomes Eq. (7), where C2 is defined in Eq. (8). When C is less than A A0 sin ϕ1 + B0 sin ϕ2 = C sin ϕ3
(5)
A0 cos ϕ1 + B0 cos ϕ2 = C cos ϕ3
(6)
A + B = C sin (ωt + ϕ3 )
(7)
C 2 = A0 2 + B0 2 + 2A0 B0 cos (ϕ2 − ϕ1 )
(8)
or B, the interference is called destructive. When it is greater, it is called constructive. For electromagnetic radiation, such as light, the amplitude in Eq. (7) represents an electric field strength. This field is a vector quantity and is associated with a particular direction in space, the direction being generally at right angles to the direction in which the wave is moving. These electric vectors can be added even when they are not parallel. For a discussion of the resulting interference phenomena see POLARIZED LIGHT; SUPERPOSITION PRINCIPLE. In the case of radio waves or microwaves which are generated with vacuum tube or solid-state oscillators, the frequency requirement for interference is easily met. In the case of light waves, it is more difficult. Here the sources are generally radiating atoms. The smallest frequency spread from such a light source will still have a bandwidth of the order of 107 Hz. Such a bandwidth occurs in a single spectrum line, and can be considered a result of the existence of wave trains no longer than 10−8 s. The frequency spread associated with such a pulse can be written as
d
P
S1
y
source S0
S2
l screen
Fig. 1. Young’s two-slit interference.
notation (9), where t is the pulse length. This means f
1 2π t
(9)
that the amplitude and phase of the wave which is the sum of the waves from two such sources will shift at random in times shorter than 10−8 s. In addition, the direction of the electric vector will shift in these same time intervals. Light which has such a random direction for the electric vector is termed unpolarized. When the phase shifts and direction changes of the light vectors from two sources are identical, the sources are termed coherent. Splitting of light sources. To observe interference with waves generated by atomic or molecular transitions, it is necessary to use a single source and to split the light from the source into parts which can then be recombined. In this case, the amplitude and phase changes occur simultaneously in each of the parts at the same time. Young’s two-slit experiment. The simplest technique for producting a splitting from a single source was done by T. Young in 1801 and was one of the first demonstrations of the wave nature of light. In this experiment, a narrow slit is illuminated by a source, and the light from this slit is caused to illuminate two adjacent slits. The light from these two parallel slits can interfere, and the interference can be seen by letting the light from the two slits fall on a white screen. The screen will be covered with a series of parallel fringes. The location of these fringes can be derived approximately as follows: In Fig. 1, S1 and S2 are the two slits separated by a distance d. Their plane is a distance l from the screen. Since the slit S0 is equidistant from S1 and S2, the intensity and phase of the light at each slit will be the same. The light falling on P from slit S1 can be represented by Eq. (10) and from S2 by Eq. (11), where f is the frequency, t the x1 A = A0 sin 2πf t − (10) c x2 B = A0 sin 2πf t − (11) c time, c the velocity of light; x1 and x2 are the distances of P from S1 and S2, and A0 is the amplitude. This amplitude is assumed to be the same for each wave since the slits are close together, and x1 and x2 are thus nearly the same. These equations are the same as Eqs. (1) and (2), with ϕ 1 = x1/c and ϕ 2 = x2/c. Accordingly, the square of the amplitude or the
Interference of waves energy on the screen from each slit alone is given by Eq. (17), where A02 is the intensity of the light from ∞ A0 2 dy (17) E1 =
source
M1
0
So
S′1 P
S′2 M2
screen Fig. 2. Fresnel’s double-mirror interference.
intensity at P can be written as Eq. (12). I = 4A0 2 cos2
2πf (x1 − x2 ) c
(12)
In general, l is very much larger than y so that Eq. (12) can be simplified to Eq. (13). yd (13) I = 4A0 2 cos2 π lλ Equation (13) is a maximum when Eq. (14) holds and a minimum when Eq. (15) holds, where n is an integer. y = nλ
l d
y = (n + 1 /2 )λ
(14) l d
(15)
Accordingly, the screen is covered with a series of light and dark bands called interference fringes. If the source behind slit S0 is white light and thus has wavelengths varying perhaps from 400 to 700 nm, the fringes are visible only where x1 − x2 is a few wavelengths, that is, where n is small. At large values of n, the position of the nth fringe for red light will be very different from the position for blue light, and the fringes will blend together and be washed out. With monochromatic light, the fringes will be visible out of values of n which are determined by the diffraction pattern of the slits. For an explanation of this see DIFFRACTION. The energy carried by a wave is measured by the intensity, which is equal to the square of the amplitude. In the preceding example of the superposition of two waves, the intensity of the individual waves in Eqs. (1) and (2) is A2 and B2, respectively. When the phase shift between them is zero, the intensity of the resulting wave is given by Eq. (16). (A + B)2 = A2 + 2AB + B2
(16)
This would seem to be a violation of the first law of thermodynamics, since this is greater than the sum of the individual intensities. In any specific experiment, however, it turns out that the energy from the source is merely redistributed in space. The excess energy which appears where the interference is constructive will disappear in those places where the energy is destructive. This is illustrated by the fringe pattern in the Young two-slit experiment. The
each slit as given by Eq. (10). The intensity from the two slits without interference would be twice this value. The intensity with interference is given by Eq. (18). The comparison between 2E1 and E3 need ∞ yd dy (18) 4A2 cos2 2π E3 = lλ 0 be made only over a range corresponding to one full cycle of fringes. This means that the argument of the cosine in Eq. (18) need be taken only from zero to π. This corresponds to a section of screen going from the center to a distance y = lλ/2d. From the two slits individually, the energy in this section of screen can be written as Eq. (19). lλ/2d A0 2 lλ 2E1 = 2 (19) A0 2 dy = d 0 With interference, the energy is given by Eq. (20). Equation (20) can be written as Eq. (21). lλ/2d yd E3 = dy (20) 4A0 2 cos2 lλ 0 π lλ A0 2 lλ 4A0 2 cos2 ϕ dϕ = (21) E3 = 2π d 0 d Thus, the total energy falling on the screen is not changed by the presence of interference. The energy density at a particular point is, however, drastically changed. This fact is most important for those waves of the electromagnetic spectrum which can be generated by vacuum-tube oscillators. The sources of radiation or antennas can be made to emit coherent waves which will undergo interference. This makes possible a redistribution of the radiated energy. Quite narrow beams of radiation can be produced by the proper phasing of a linear antenna array. See ANTENNA (ELECTROMAGNETISM). The double-slit experiment also provides a good illustration of Niels Bohr’s principle of complementarity. For detailed information on this see QUANTUM MECHANICS Fresnel double mirror. Another way of splitting the light
from the source is the Fresnel double mirror (Fig. 2). Light from the slit S0 falls on two mirrors M1 and M2 which are inclined to each other at an angle of the order of a degree. On a screen where the illumination from the two mirrors overlaps, there will appear a set of interference fringes. These are the same as the fringes produced in the two-slit experiment, since the light on the screen comes from the images of the slits S1 and S2 formed by the two mirrors, and these two images are the equivalent of two slits. Fresnel biprism. A third way of splitting the source is the Fresnel biprism. A sketch of a cross section of this device is shown in Fig. 3. The light from the slit at S0 is transmitted through the two halves of the prism to the screen. The beam from each half will strike the
333
334
Interference of waves
source
d
S 1′ So
θ
a
S 2′ prism
screen
Fig. 3. Fresnel biprism interference.
Fig. 4. Equipment for demonstrating Fresnel biprism interference.
screen at a different angle and will appear to come from a source which is slightly displaced from the original slit. These two virtual slits are shown in the sketch at S1 and S2. Their separation will depend on the distance of the prism from the slit S0 and on the angle θ and index of refraction of the prism material. In Fig. 3, a is the distance of the slit from the biprism, and l the distance of the biprism from the screen. The distance of the two virtual slits from the screen is thus a + l. The separation of the two virtual slits is given by Eq. (22), where µ is the refractive index d = 2a(µ − 1)θ
the source is with Lloyd’s mirror (Fig. 7). The slit S1 and its virtual image S2 constitute the double source. Part of the light falls directly on the screen, and part is reflected at grazing incidence from a plane mirror. This experiment differs from the previously discussed experiments in that the two beams are no longer identical. If the screen is moved to a point where it is nearly in contact with the mirror, the fringe of zero path difference will lie on the intersection of the mirror plane with the screen. This fringe turns out to be dark rather than light, as in the case of the previous interference experiments. The only explanation for this result is that light experiences a 180◦ phase shift on reflection from a material of higher refractive index than its surrounding medium. The equation for maximum and minimum light intensity at the screen must thus be interchanged for Lloyd’s mirror fringes. Amplitude splitting. The interference experiments discussed have all been done by splitting the wavefront of the light coming from the source. The energy from the source can also be split in amplitude. With such amplitude-splitting techniques, the light from the source falls on a surface which is partially reflecting. Part of the light is transmitted, part is reflected, and after further manipulation these parts are recombined to give the interference. In one type of experiment, the light transmitted through the surface is reflected from a second surface back through the partially reflecting surface, where it combines with the wave reflected from the first surface (Fig. 8). Here the arrows represent the normal to the wavefront of the light passing through surface S1 to surface S2. The wave is incident at A and C. The section
(22)
of the prism material. This can be put in Eq. (14) for the two-slit interference pattern to give Eq. (23) for the position of a bright fringe. y = nλ
a+l 2a(µ − 1)θ
(23)
A photograph of the experimental equipment for demonstrating interference with the Fresnel biprism is shown in Fig. 4. A typical fringe pattern is shown in Fig. 5. This pattern was obtained with a mercuryarc source, which has several strong spectrum lines, accounting in part for the intensity variation in the pattern. The pattern is also modified by diffraction at the apex of the prism. Billet split lens. The source can also be split with the Billet split lens (Fig. 6). Here a simple lens is sawed into two parts which are slightly separated. Lloyd’s mirror. An important technique of splitting
Fig. 5. Interference fringes formed with Fresnel biprism and mercury-arc light source.
source So
lens Fig. 6. Billet split-lens interference.
screen
Interference of waves
source
S1
P mirror
S ′2
screen Fig. 7. Lloyd’s mirror interference.
at A is partially transmitted to B, where it is again partially reflected to C. The wave leaving C now consists of two parts, one of which has traveled a longer distance than the other. These two waves will interfere. Let AD be the perpendicular from the ray at A to the ray going to C. The path difference will be given by Eq. (24), where µ is the refractive index of the = 2µ(AB) − (CD)
(24)
medium between the surfaces S1 and S2, and AB and CD are defined in Eqs. (25) and (26). (AB) =
d cos r
(25)
(CD) = 2(AB) sin r cos i
(26)
From Snell’s law, Eq. (27) is obtained and thus Eq. (28) holds.
=
sin i = µ sin r
(27)
2µd 2µd − sin2 r = 2µd cos r cos r cos r
(28)
The difference in terms of wavelength and the phase difference are, respectively, given by Eqs. (29) and (30). 2µd cos r λ
(29)
4πµd cos r +π λ
(30)
= ϕ =
The phase difference of π radians is added because of the phase shift experienced by the light reflected at S1. The experimental proof of this 180◦ phase shift
i
D i
A
C
S1
d r B
Fig. 8. Dielectric-plate-reflection interference.
S2
was shown in the description of interference with Lloyd’s mirror. If the plate of material has a lower index than the medium in which it is immersed, the π radians must still be added in Eq. (30), since now the beam reflected at S2 will experience this extra phase shift. The purely pragmatic necessity of such an additional phase shift can be seen by considering the intensity of the reflected light when the surfaces S1 and S2 almost coincide. Without the extra phase shift, the two reflected beams would be in phase and the reflection would be strong. This is certainly not proper for a film of vanishing thickness. Constructive interference will take place at wavelengths for which ϕ = 2mπ , where m is an integer. If the surfaces S1 and S2 are parallel, the fringes will be located optically at infinity. If they are not parallel, d will be a function of position along the surfaces and the fringes will be located near the surface. The intensity of the fringes will depend on the value of the partial reflectivity of the surfaces. Testing of optical surfaces. Observation of fringes of this type can be used to determine the contour of a surface. The surface to be tested is put close to an optically flat plate. Monochromatic light is reflected from the two surfaces and examined as in Fig. 8. One of the first experiments with fringes of this type was performed by Isaac Newton. A convex lens is pressed against a glass plate and illuminated with monochromatic light. A series of circular interference fringes known as Newton’s rings appear around the point of contact. From the separation between the fringes, it is possible to determine the radius of curvature of the lens. Thin films. Interference fringes of this two-surface type are responsible for the colors which appear in oil films floating on water. Here the two surfaces are the oil-air interface and the oil-water interface. The films are close to a visible light wavelength in thickness. If the thickness is such that, in a particular direction, destructive interference occurs for green light, red and blue will still be reflected and the film will have a strong purple appearance. This same general phenomenon is responsible for the colors of beetles’ wings. Channeled spectrum. Amplitude splitting shows clearly another condition that must be satisfied for interference to take place. The beams from the source must not only come from identical points, but they must also originate from these points at nearly the same time. The light which is reflected from C in Fig. 8 originates from the source later than the light which makes a double traversal between S1 and S2. If the surfaces are too far apart, the spectral regions of constructive and destructive interference become so close together that they cannot be resolved. In the case of interference by wavefront splitting, the light from different parts of a source could only be considered coherent if examined over a sufficiently short time interval. In the case of amplitude splitting, the interference when surfaces are widely separated can only be seen if examined over a sufficiently narrow frequency interval. If the two surfaces are illuminated with white light
335
Interference of waves
(32)
has passed through the sheet and is reflected from the second surface and back through the sheet to join A. The value of B is given by Eq. (33), where the B=
n2 − n3 n2 + n 3
(33)
approximation is made that the intensity of the light is unchanged by passing through the first surface and where n3 is the index of the material at the boundary of the far side of the sheet. Nonreflecting film. An interesting application of Eq. (32) is the nonreflecting film. A single dielectric layer is evaporated onto a glass surface to reduce the reflectivity of the surface to the smallest possible value. From Eq. (32) it is clear that this takes place when cos ϕ = −1. If the surface is used in an instrument with a broad spectral range, such as a visual device, the film thickness should be adjusted to put the interference minimum in the first order and in the middle of the desired spectral range. For the eye, this wavelength is approximately in the yellow so that such films reflect in the red and blue and appear purple. The index of the film should be chosen to make C2 = 0. At this point Eqs. (34)–(36) hold.
ϕ=
(34)
n2 − n3 n1 − n2 = n1 + n2 n2 + n3
(35)
n1 n2 − n2 2 + n1 n3 − n2 n3 = n1 n2 − n1 n3 + n2 2 − n2 n3 (36)
(38)
Equation (7) was derived for the superposition of two waves. It is possible to derive a similar expression for the superposition of many waves. From A Ar
n1
Ar
A
(A − B)2 = 0
4πnd cos θ λ
)
C 2 = A2 + B2 + 2AB cos ϕ
case of a glass surface in air, n1 = 1 and n3 ∼ = 1.5. Magnesium fluoride is a substance which is frequently used as a nonreflective coating, since it is hard and approximately satisfies the relationship of Eq. (37). The purpose of reducing the reflection from an optical element is to increase its transmission, since the energy which is not reflected is transmitted. In the case of a single element, this increase is not particularly important. Some optical instruments may have 15–20 air-glass surfaces, however, and the coating of these surfaces gives a tremendous increase in transmission. Haidinger fringes. When the second surface in twosurface interference is partially reflecting, interference can also be observed in the wave transmitted through both surfaces. The interference fringes will be complementary to those appearing in reflection. Their location will depend on the parallelism of the surfaces. For plane parallel surfaces, the fringes will appear at infinity and will be concentric rings. These were first observed by W. K. Haidinger and are called Haidinger fringes. Multiple-beam interference. If the surfaces S1 and S2 are strongly reflecting, it is necessary to consider multiple reflections between them. For air-glass surfaces, this does not apply since the reflectivity is of the order of 4%, and the twice-reflected beam is much reduced in intensity. In Fig. 9 the situation in which the surfaces S1 and S2 have reflectivities r1 and r2 is shown. The space between the surfaces has an index n2 and thickness d. An incident light beam of amplitude A is partially reflected at the first surface. The transmitted component is reflected at S2 and is reflected back to S1 where a second splitting takes place. This is repeated. Each successive component of the waves leaving S1 is retarded with respect to the next. The amount of each retardation is given by Eq. (38).
2
A0 is the amplitude of the incident wave and n1 and n2 are the refractive indices of the materials in the order in which they are encountered by the light. In the simple case of a dielectric sheet, the intensity of the light reflected normally will be given by Eq. (32), where B is the amplitude of the wave which
(37)
r1
(31)
√ n1 n3
−
n1 − n2 n1 + n2
n2 =
1
A = A0
Equation (36) can be reduced to Eq. (37). In the
(1
and the eye is used as the analyzer, interference cannot be seen when the separation is more than a few wavelengths. The interval between successive wavelengths of constructive interference becomes so small that each spectral region to which the eye is sensitive is illuminated, and no color is seen. In this case, the interference can again be seen by examining the reflected light with a spectroscope. The spectrum will be crossed with a set of dark fringes at those wavelengths for which there is destructive interference. This is called a channeled spectrum. For large separations of the surfaces, the separation between the wavelengths of destructive interference becomes smaller than the resolution of the spectrometer, and the fringes are no longer visible. Fresnel coefficient. The amplitude of the light reflected at normal incidence from a dielectric surface is given by the Fresnel coefficient, Eq. (31), where
2
336
1 − r12 θ
A 1 − r12 r2
S1 d
n2 S2 n3
Fig. 9. Multiple reflection of wave between two surfaces.
Interferometry
0.8
r = 0.1
R
0.6
0.4
r = 0.5 0.2
r = 0.9
80°
60°
40° ϕ/2
0°
20°
Fig. 10. The shape of multiple-beam fringes for different values of surface reflectivity.
Fig. 9, the different waves at a plane somewhere above S1 can be represented by the following expressions: Incoming wave First reflected wave Second reflected wave Third reflected wave
= A sin ωt = Ar1 sin ωt = A(1 − r1 2 )r2 sin(ωt + ϕ) = −A(1 − r1 2 )2 r1 r2 2 × sin(ωt + 2ϕ)
By inspection of these terms, one can write down the complete series. As in Eq. (3), the sine terms can be broken down and coefficients collected. √ A simpler method is to multiply each term by i = − 1 and add a cosine term with the same coefficient and argument. The individual terms then are all of the form of expression (39), where m is an integer. Be−iωt e−imϕ
(39)
The individual terms of expression (39) can be easily summed. For the reflected wave one obtains Eq. (40). Again, as in the two-beam case, the miniR=
r1 + r2 e−iϕ 1 + r1 r2 e−iϕ
(40)
mum in the reflectivity R is obtained when ϕ = Nπ, where N is an odd integer and r1 = r2. The fringe shape, however, can be quite different from the earlier case, depending on the values of the reflectives r1 and r2. The greater these values, the sharper become the fringes. It was shown earlier how two-beam interference could be used to measure the contour of a surface. In this technique, a flat glass test plate was placed over the surface to be examined and monochromatic interference fringes were formed between the test surface and the surface of the plate. These two-beam fringes have intensities which vary as the cosine squared of the path difference. It is very difficult with such fringe to detect variations in fringe straightness
or, in other terms, variations of surface planarity that are smaller than 1/20 wavelength. If the surface to be examined is coated with silver and the test surface is also coated with a partially transmitting metallic coat, the reflectivity increases to a point where many beams are involved in the formation of the interference fringes. The shape of the fringes is given by Eq. (40). The shape of fringes for different values of r is shown in Fig. 10. With high-reflectivity fringes, the sensitivitity to a departure from planarity is increased far beyond 1/20 wavelength. It is thus possible with partially silvered surfaces to get a much better picture of small irregularities than with uncoated surfaces. The increase in sensitivity is such that steps in cleaved mica as small as 1 nm in height can be seen by examining the monochromatic interference fringes produced between a silvered mica surface and a partially silvered glass flat. See INTERFERENCE FILTERS; INTERFEROMEBruce H. Billings TRY. Bibliography. M. Born and E. Wolf, Principles of Optics, 7th ed., 1999; F. A. Jenkins and H. E. White, Fundamentals of Optics, 4th ed., 1976; J. R. MeyerArendt, Introduction to Classical and Modern Optics, 4th ed., 1995; F. L. Pedrotti and L. S. Pedrotti, Introduction to Optics, 3d ed., 2006.
Interferometry The design and use of optical interferometers. Optical interferometers based on both two-beam interference and multiple-beam interference of light are extremely powerful tools for metrology and spectroscopy. A wide variety of measurements can be performed, ranging from determining the shape of a surface to an accuracy of less than a millionth of an inch (25 nanometers) to determining the separation, by millions of miles, of binary stars. In spectroscopy, interferometry can be used to determine the hyperfine structure of spectrum lines. By using lasers in classical interferometers as well as holographic interferometers and speckle interferometers, it is possible to perform deformation, vibration, and contour measurements of diffuse objects that could not previously be performed. Basic classes of interferometers. There are two basic classes of interferometers: division of wavefront and division of amplitude. Figure 1 shows two arrangements for obtaining division of wavefront. For the Young’s double pinhole interferometer (Fig. 1a), the light from a point source illuminates two pinholes. The light diffracted by these pinholes gives the interference of two point sources. For the Lloyd’s mirror experiment (Fig. 1b), a mirror is used to provide a second image S2 of the point source S1, and in the region of overlap of the two beams the interference of two spherical beams can be observed. There are many other ways of obtaining division of wavefront; however, in each case the light leaving the source is spatially split, and then by use of diffraction, mirrors, prisms, or lenses the two spatially separated beams are superimposed.
337
338
Interferometry
point source two pinholes
interference of two spherical waves
(a)
S1 S2
mirror
interference of two spherical waves
(b) Fig. 1. Interference produced by division of wavefront. (a) Young’s two-pinhole interferometer. (b) Lloyd’s mirror.
point source
glass plate two virtual images of point source Fig. 2. Division of amplitude.
Figure 2 shows one technique for obtaining division of amplitude. For division-of-amplitude interferometers a beam splitter of some type is used to pick off a portion of the amplitude of the radiation which is then combined with a second portion of the amplitude. The visibility of the resulting interference fringes is a maximum when the amplitudes of the two interfering beams are equal. See INTERFERENCE OF WAVES. Michelson interferometer. The Michelson interferometer (Fig. 3) is based on division of amplitude. Light from an extended source S is incident on a partially reflecting plate (beam splitter) P1. The light transmitted through P1 reflects off mirror M1 back to plate P1. The light which is reflected proceeds to M2 which reflects it back to P1. At P1, the two waves are again partially reflected and partially transmitted, and a portion of each wave proceeds to the receiver R, which may be a screen, a photocell, or a human
eye. Depending on the difference between the distances from the beam splitter to the mirrors M1 and M2, the two beams will interfere constructively or destructively. Plate P2 compensates for the thickness of P1. Often when a quasimonochromatic light source is used with the interferometer, compensating plate P2 is omitted. The function of the beam splitter is to superimpose (image) one mirror onto the other. When the mirrors’ images are completely parallel, the interference fringes appear circular. If the mirrors are slightly inclined about a vertical axis, vertical fringes are formed across the field of view. These fringes can be formed in white light if the path difference in part of the field of view is made zero. Just as in other interference experiments, only a few fringes will appear in white light, because the difference in path will be different for wavelengths of different colors. Accordingly, the fringes will appear colored close to zero path difference, and will disappear at larger path differences where the fringe maxima and minima for the different wavelengths overlap. If light reflected off the beam splitter experiences a onehalf-cycle relative phase shift, the fringe of zero path difference is black, and can be easily distinguished from the neighboring fringes. This makes use of the instrument relatively easy. The Michelson interferometer can be used as a spectroscope. Consider first the case of two close spectrum lines as a light source for the instrument. As the mirror M1 is shifted, fringes from each spectral line will cross the field. At certain path differences between M1 and M2, the fringes for the two spectral lines will be out of phase and will essentially disappear; at other points they will be in phase and will be reinforced. By measuring the distance between successive maxima in fringe contrast, it is possible to determine the wavelength difference between the lines. This is a simple illustration of a very broad use for any two-beam interferometer. As the path length L is changed, the variation in intensity I(L) of the light coming from an interferometer gives information on the basis of which the spectrum of the input light can be derived. The equation for the intensity of the emergent energy can be written as M2
P1 S
R Fig. 3. Michelson interferometer.
P2
M1
Interferometry beam expander
test surface
laser reference surface
interferogram Fig. 4. Twyman-Green interferometer for testing flat surfaces.
Fig. 5. Interferogram obtained with the use of a Twyman-Green interferometer to test a flat surface.
Eq. (1), where β is a constant, and I(λ) is the inten ∞ βL 2 dλ (1) I(λ) cos I(L) = λ 0 sity of the incident light at different wavelengths λ. This equation applies when the mirror M1 is moved linearly with time from the position where the path difference with M2 is zero, to a position which depends on the longest wavelength in the spectrum to be examined. From Eq. (1), it is possible mathematically to recover the spectrum I(λ). In certain situations, such as in the infrared beyond the wavelength region of 1.5 micrometers, this technique offers a large advantage over conventional spectroscopy in that its utilization of light is extremely efficient. See INFRARED SPECTROSCOPY. Twyman-Green interferometer. If the Michelson interferometer is used with a point source instead of an extended source, it is called a Twyman-Green interferometer. The use of the laser as the light source for the Twyman-Green interferometer has made it an extremely useful instrument for testing optical components. The great advantage of a laser source is that it makes it possible to obtain bright, good-contrast, interference fringes even if the path lengths for the two arms of the interferometer are quite different. See LASER.
Figure 4 shows a Twyman-Green interferometer for testing a flat mirror. The laser beam is expanded to match the size of the sample being tested. Part of the laser light is transmitted to the reference surface, and part is reflected by the beam splitter to the flat surface being tested. Both beams are reflected back to the beam splitter, where they are combined to form interference fringes. An imaging lens projects the surface under test onto the observation plane. Fringes (Fig. 5) show defects in the surface being tested. If the surface is perfectly flat, then straight, equally spaced fringes are obtained. Departure from the straight, equally spaced condition shows directly how the surface differs from being perfectly flat. For a given fringe, the difference in optical path between light going from laser to reference surface to observation plane and the light going from laser to test surface to observation plane is a constant. (The optical path is equal to the product of the geometrical path times the refractive index.) Between adjacent fringes (Fig. 5), the optical path difference changes by one wavelength, which for a helium-neon laser corresponds to 633 nm. The number of straight, equally spaced fringes and their orientation depend upon the tip-tilt of the reference mirror. That is, by tipping or tilting the reference mirror the difference in optical path can be made to vary linearly with distance across the laser beam. Deviations from flatness of the test mirror also cause optical path variations. A height change of half a wavelength will cause an optical path change of one wavelength and a deviation from fringe straightness of one fringe. Thus, the fringes give surface height information, just as a topographical map gives height or contour information. The existence of the essentially straight fringes provides a means of measuring surface contours relative to a tilted plane. This tilt is generally introduced to indicate the sign of the surface error, that is, whether the errors correspond to a hill or a valley. One way to get this sign information is to push in on the piece being tested when it is in the interferometer. If the fringes move toward the right when the test piece is pushed toward the beam splitter, then fringe deviations from straightness toward the right correspond to high points (hills) on the test surface and deviations to the left correspond to low points (valleys). beam expander
reference mirror
laser
diverger lens
test mirror
interferogram Fig. 6. Twyman-Green interferometer for testing spherical mirrors or lenses.
339
340
Interferometry The basic Twyman-Green interferometer (Fig. 4) can be modified (Fig. 6) to test concave-spherical mirrors. In the interferometer, the center of curvature of the surface under test is placed at the focus of a high-quality diverger lens so that the wavefront is reflected back onto itself. After this retroflected wavefront passes through the diverger lens, it will be essentially a plane wave, which, when it interferes with the plane reference wave, will give interference fringes similar to those shown in Fig. 5 for testing flat surfaces. In this case it indicates how the concavespherical mirror differs from the desired shape. Likewise, a convex-spherical mirror can be tested. Also, if a high-quality spherical mirror is used, the highquality diverger lens can be replaced with the lens to be tested. Fizeau interferometer. One of the most commonly used interferometers in optical metrology is the Fizeau interferometer, which can be thought of as a folded Twyman-Green interoferometer. In the Fizeau, the two surfaces being compared, which can be flat, spherical, or aspherical, are placed in close contact. The light reflected off these two surfaces produces interference fringes. For each fringe, the separation between the two surfaces is a constant. If the two surfaces match, straight, equally spaced fringes result. Surface height variations between the two surfaces cause the fringes to deviate from straightness or equal separations, where one fringe deviation from straightness corresponds to a variation in separation between the two surfaces by an amount equal to one-half of the wavelength of the light source used in the interferometer. The wavelength of a helium source, which is often used in a Fizeau interferometer, is 587.56 nm; hence one fringe corresponds to a height variation of approximately 0.3 µm. Mach-Zehnder interferometer. The Mach-Zehnder interferometer (Fig. 7) is a variation of the Michelson interferometer and, like the Michelson interferometer, depends on amplitude splitting of the wavefront. Light enters the instrument and is reflected and transmitted by the semitransparent mirror M1.
M1
M3
C2
C1
M2
Fig. 7. Mach-Zehnder interferometer.
M4
plane parallel plate
collimated beam
two sheared beams Fig. 8. Lateral shear interferometer.
The reflected portion proceeds to M3, where it is reflected through the cell C2 to the semitransparent mirror M4. Here it combines with the light transmitted by M1 to produce interference. The light transmitted by M1 passes through a cell C1, which is similar to C2 and is used to compensate for the windows of C1. The major application of this instrument is in studying airflow around models of aircraft, missiles, or projectiles. The object and associated airstream are placed in one arm of the interferometer. Because the air pressure varies as it flows over the model, the index of refraction varies, and thus the effective path length of the light in this beam is a function of position. When the variation is an odd number of half-waves, the light will interfere destructively and a dark fringe will appear in the field of view. From a photograph of the fringes, the flow pattern can be mathematically derived. A major difference between the Mach-Zehnder and the Michelson interferometer is that in the MachZehnder the light goes through each path in the instrument only once, whereas in the Michelson the light traverses each path twice. This double traversal makes the Michelson interferometer extremely difficult to use in applications where spatial location of index variations is desired. The incoming and outgoing beams tend to travel over slightly different paths, and this lowers the resolution because of the index gradient across the field. Shearing interferometers. In a lateral-shear interferometer, an example of which is shown in Fig. 8, a wavefront is interfered with a shifted version of itself. A bright fringe is obtained at the points where the slope of the wavefront times the shift between the two wavefronts is equal to an integer number of wavelengths. That is, for a given fringe the slope or derivative of the wavefront is a constant. For this reason a lateral-shear interferometer is often called a differential interferometer. Another type of shearing interferometer is a radialshear interferometer. Here, a wavefront is interfered with an expanded version of itself. This interferometer is sensitive to radial slopes. The advantages of shearing interferometers are that they are relatively simple and inexpensive, and since the reference wavefront is self-generated, an external wavefront is not needed. Since an external reference beam is not required, the source
Interferometry of the central bright fringe from the nearest dark fringe. This angle is given by Eq. (2), where d is the λ =α 2d broad source
d
focusing lens
screen
Fig. 9. Fabry-Perot interferometer.
requirements are reduced from those of an interferometer such as a Twyman-Green. For this reason, shearing interferometers, in particular lateral-shear interferometers, are finding much use in applications such as adaptive optics systems for correction of atmospheric turbulence where the light source has to be a star, or planet, or perhaps just reflected sunlight. See ADAPTIVE OPTICS. Michelson stellar interferometer. A Michelson stellar interferometer can be used to measure the diameter of stars which are as small as 0.01 second of arc. This task is impossible with a ground-based optical telescope since the atmosphere limits the resolution of the largest telescope to not much better than 1 second of arc. The Michelson stellar interferometer is a simple adaptation of Young’s two-slit experiment. In its first form, two slits were placed over the aperture of a telescope. If the object being observed were a true point source, the image would be crossed with a set of interference bands. A second point source separated by a small angle from the first would produce a second set of fringes. At certain values of this angle, the bright fringes in one set will coincide with the dark fringes in the second set. The smallest angle α at which the coincidence occurs will be that angle subtended at the slits by the separation of the peak
(a)
(2)
separation of the slits, λ the dominant wavelength of the two sources, and α their angular separation. The measurement of the separation of the sources is performed by adjusting the separation d between the slits until the fringes vanish. Consider now a single source in the shape of a slit of finite width. If the slit subtends an angle at the telescope aperture which is larger than α, the interference fringes will be reduced in contrast. For various line elements at one side of the slit, there will be elements of angle α which will cancel the fringes from the first element. By induction, it is clear that for a separation d such that the slit source subtends an angle as given by Eq. (3) the fringes from a single α =
λ d
(3)
slit will vanish completely. For additional information on the Michelson stellar interferometer see DIFFRACTION
Fabry-Perot interferometer. All the interferometers discussed above are two-beam interferometers. The Fabry-Perot interferometer (Fig. 9) is a multiplebeam interferometer since the two glass plates are partially silvered on the inner surfaces, and the incoming wave is multiply reflected between the two surfaces. The position of the fringe maxima is the same for multiple-beam interference as twobeam interference; however, as the reflectivity of the two surfaces increases and the number of interfering beams increases, the fringes become sharper.
(b)
Fig. 10. Double-exposure holographic interferograms. (a) Interferogram of candle flame. (b) Interferogram of debanded region of honeycomb construction panel. (From C. M. Vest, Holographic Interferometry, John Wiley and Sons, 1979)
341
342
Interferometry transmission pattern. The position of the fringes depends upon the wavelength. That is, each wavelength gives a separate fringe pattern. The minimum resolvable wavelength difference is determined by the ability to resolve close fringes. The ratio of the wavelength λ to the least resolvable wavelength difference λ is known as the chromatic resolving power R . At nearly normal incidence it is given by Eq. (5), where n is the refractive index between the R=
Fig. 11. Photograph of time-average holographic interferogram. (From C. M. Vest, Holographic Interferometry, John Wiley and Sons, 1979)
A quantity of particular interest in a Fabry-Perot is the ratio of the separation of adjacent maxima to the half-width of the fringes. It can be shown that this ratio, known as the finesse, is given by Eq. (4), where √ π R F = (4) 1−R R is the reflectivity of the silvered surfaces. The multiple-beam Fabry-Perot interferometer is of considerable importance in modern optics for spectroscopy. All the light rays incident on the FabryPerot at a given angle will result in a single circular fringe of uniform irradiance. With a broad diffuse source, the interference fringes will be narrow concentric rings, corresponding to the multiple-beam
Fig. 12. Diffraction patterns from time-averaged speckle interferogram of a surface vibrating in its own plane with a figure-eight motion. (From J. C. Dainty, Laser Speckle and Related Phenomena, Springer-Verlag, 1975)
λ 2nd =F ( λ)min λ
(5)
two mirrors separated a distance d. For a wavelength of 500 nm, nd = 10 mm, and R = 90%, the resolving power is well over 106. See REFLECTION OF ELECTROMAGNETIC RADIATION; RESOLVING POWER (OPTICS). When Fabry-Perot interferometers are used with lasers, they are generally used in the central spot scanning mode. The interferometer is illuminated with a collimated laser beam, and all the light transmitted through the Fabry-Perot is focused onto a detector, whose output is displayed on an oscilloscope. Often one of the mirrors is on a piezoelectric mirror mount. As the voltage to the piezoelectric crystal is varied, the mirror separation is varied. The light output as a function of mirror separation gives the spectral frequency content of the laser source. Holographic interferometry. A wave recorded in a hologram is effectively stored for future reconstruction and use. Holographic interferometry is concerned with the formation and interpretation of the fringe pattern which appears when a wave, generated at some earlier time and stored in a hologram, is later reconstructed and caused to interfere with a comparison wave. It is the storage or time-delay aspect which gives the holographic method a unique advantage over conventional optical interferometry. A hologram can be made of an arbitrarily shaped, rough scattering surface, and after suitable processing, if the hologram is illuminated with the same reference wavefront used in recording the hologram, the hologram will produce the original object wavefront. If the hologram is placed back into its original position, a person looking through the hologram will see both the original object and the image of the object stored in the hologram. If the object is now slightly deformed, interference fringes will be produced which tell how much the surface is deformed. Between adjacent fringes the optical path between the source and viewer has changed by one wavelength. While the actual shape of the object is not determined, the change in the shape of the object is measured to within a small fraction of a wavelength, even though the object’s surface is rough compared to the wavelength of light. Double-exposure. Double-exposure holographic interferometry (Fig. 10) is similar to real-time holographic interferometry described above, except now two exposures are made before processing: one exposure with the object in the undeformed state and a second exposure after deformation. When the hologram reconstruction is viewed, interference fringes
Interhalogen compounds will be seen which show how much the object was deformed between exposures. The advantage of the double-exposure technique over the real-time technique is that there is no critical replacement of the hologram after processing. The disadvantage is that continuous comparison of surface displacement relative to an initial state cannot be made, but rather only the difference between two states is determined. Time-average. In time-average holographic interferometry (Fig. 11) a time-average hologram of a vibrating surface is recorded. If the maximum amplitude of the vibration is limited to some tens of light wavelengths, illumination of the hologram yields an image of the surface on which is superimposed several interference fringes which are contour lines of equal displacement of the surface. Time-average holography enables the vibrational amplitudes of diffusely reflecting surfaces to be measured with interferometric precision. See HOLOGRAPHY. Speckle interferometry. A random intensity distribution, called a speckle pattern, is generated when light from a highly coherent source, such as a laser, is scattered by a rough surface. The use of speckle patterns in the study of object displacements, vibration, and distortion is becoming of more importance in the nondestructive testing of mechanical components. For example, time-averaged speckle photographs can be used to analyze the vibrations of an object in its plane. In physical terms the speckles in the image are drawn out into a line as the surface vibrates, instead of being double as in the doubleexposure technique. The diffraction pattern of this smeared-out speckle-pattern recording is related to the relative time spent by the speckle at each point of its trajectory (Fig. 12). See NONDESTRUCTIVE EVALUATION. Speckle interferometry can be used to perform astronomical measurements similar to those performed by the Michelson stellar interferometer. Stellar speckle interferometry is a technique for obtaining diffraction-limited resolution of stellar objects despite the presence of the turbulent atmosphere that limits the resolution of ground-based telescopes to approximately 1 second of arc. For example, the diffraction limit of the 200-in.-diameter (5-m) Palomar Mountain telescope is approximately 0.02 second of arc, 1/√50 the resolution limit set by the atmosphere. The first step of the process is to take a large number, perhaps 100, of short exposures of the object, where each photo is taken for a different realization of the atmosphere. Next the optical diffraction pattern, that is, the squared modulus of the Fourier transform of all the short-exposure photographs, is added. By taking a further Fourier transform of each ensemble average diffraction pattern, the ensemble average of the spatial autocorrelation of the diffraction-limited images of each object is obtained. See SPECKLE. Phase-shifting interferometry. Electronic phasemeasurement techniques can be used in interferometers such as the Twyman-Green, where the phase
distribution across the interferogram is being measured. Phase-shifting interferometry is often used for these measurements since it provides for rapid precise measurement of the phase distribution. In phase-shifting interferometry, the phase of the reference beam in the interferometer is made to vary in a known manner. This can be achieved, for example, by mounting the reference mirror on a piezoelectric transducer. By varying the voltage on the transducer, the reference mirror is moved a known amount to change the phase of the reference beam a known amount. A solid-state detector array is used to detect the intensity distribution across the interference pattern. This intensity distribution is read into computer memory three or more times, and between each intensity measurement the phase of the reference beam is changed a known amount. From these three or more intensity measurements, the phase across the interference pattern can be determined to within a fraction of a degree. James C. Wyant Bibliography. P. Hariharan, Basics of Interferometry, 1991; E. Hecht and A. Zajac, Optics, 3d ed., 1997; G. Hernandez, Fabry-Perot Interferometers, 1988; D. Malacara (ed.), Optical Shop Testing, 2d ed., 1992; P. K. Rastogi, Holographic Interferometry: Principles and Methods, 1994; R. S. Sirohi (ed.), Speckle Metrology, 1993; W. H. Steel, Interferometry, 2d ed., 1985.
Interhalogen compounds The elements of the halogen family (fluorine, chlorine, bromine, and iodine) possess an ability to react with each other to form a series of binary interhalogen compounds (or halogen halides) of general composition given by XY; where n can have the values 1, 3, 5, and 7, and where X is the heavier (less electronegative) of the two elements. All possible diatomic compounds of the first four halogens have been prepared. In other groups a varying number of possible combinations is absent. Although attempts have been made to prepare ternary interhalogens, they have been unsuccessful; there is considerable doubt that such compounds can exist. See HALOGEN ELEMENTS. Formation. In general, interhalogen compounds are formed when the free halogens are mixed as gases, or, as in the case of iodine chlorides and bromides, by reacting solid iodine with liquid chlorine or bromine. Most of the nonfluorinated interhalogens also readily form when solutions of the halogens in inert solvent (for example, carbon tetrachloride) are mixed. It is also possible to form them by the reaction of a halogen with a salt of a more electropositive halogen, such as KI + Cl2 → KCl + ICl. Higher polyhalides can also be prepared by reacting more electronegative halogen with a corresponding halogen halide, for example, ICl + Cl2 → ICl3 or ClF3 + F2 → ClF5. Chlorine pentafluoride can also be prepared by reacting a MClF4 salt with fluorine (M = alkali metal), MClF4 + F2 → MF + ClF5. A list of known interhalogen compounds and some of their
343
344
Interhalogen compounds TABLE 1. Known interhalogen compounds XY
XY3
XY5
mp bp
CIF ⫺154° C (⫺245° F) ⫺101° C (⫺150° F)
CIF3 ⫺76° C (⫺105° F) 12° C (54° F)
CIF5 ⫺103° C (⫺153° F) ⫺14° C (6.8° F)
mp bp
BrF ∼⫺33° C (91° F) ∼ 20° C (68° F)
BrF3 8.77° C (47.8° F) 125° C (257° F)
BrF5 ⫺62.5° C (⫺80.5° F) 40.3° C (105° F)
IF
IF3
IF5
mp bp
— — BrCl
⫺28° C (⫺18° F) — ICI3 ∗
10° C (50° F) 101° C (214° F)
mp bp
∼⫺54° C (⫺65° F) — ICI†
101° C (214° F) —
mp bp
27.2 C(α) ∼ 100° C (212° F) IBr
mp bp
40° C (104° F) 119° C (246° F)
∗In
XY7 IF7 4.77 °C (40.6° F) (sublimes)
the solid state the compound forms a dimer. β-modification exists, mp 14° C (57° F).
† Unstable
physical properties is given in Table 1. Interhalogen compounds containing astatine have not been isolated as yet, although the existence of AtI and AtBr has been demonstrated by indirect measurements. Stability. Thermodynamic stability of the interhalogen compounds varies within rather large limits. In general, for a given group the stability increases with increasing difference in electronegativity between the two halogens. Thus for XY group the free energy of formation of the interhalogens, relative to the elements in their standard conditions, falls in the following order: IF > BrF > ClF > ICl > IBr > BrCl. It should be noted, however, that the fluorides of this series can be obtained only in minute quantities since they readily undergo disproportionation reaction, for example, 5IF → 2I2 + IF5. The least-stable compound, bromine chloride, has only recently been isolated in the pure state. Decrease of stability with decreasing difference of electronegativity is readily apparent in the higher interhalogens since compounds such as BrCl3, IBr3, or ICl5 are unknown. The only unambiguously prepared interhalogen containing eight halogen atoms is IF7. See ELECTRONEGATIVITY. Reactivity. The reactivity of the polyhalides reflects the reactivity of the halogens they contain. In general, they behave as strong oxidizing and halogenating agents. Most halogen halides (especially halogen fluorides) readily attack metals, yielding the corresponding halide of the more electronegative halogen. In the case of halogen fluorides, the reaction results in the formation of the fluoride, in which the metal is often found in its highest oxidation state, for example, AgF2, CoF3, and so on. Noble metals, such as platinum, are resistant to the attack of the interhalogens at room temperature. Halogen fluorides
are often handled in nickel vessels, but in this case the resistance to attack is due to the formation of a protective layer of nickel(II) fluoride. All halogen halides readily react with water. Such reactions can be quite violent and, with halogen fluorides, they may be explosive. Reaction products vary depending on the nature of the interhalogen compound. For example, in the case of chlorine trifluoride, reaction with an excess of water yields HF, Cl2, and O2 as reaction products. The same reactivity is observed with organic compounds. The nonfluorinated interhalogens do not react with completely halogenated hydrocarbons, and solutions of ICl, IBr, and BrCl are quite stable in dry carbon tetrachloride or hexachloroethane, as well as in fluorocarbons, as long as the solvents are very dry. They readily react with aliphatic and aromatic hydrocarbons and with oxygen- or nitrogencontaining compounds. The reaction rates, however, can be rather slow and dilute solutions of ICl can be stable for several hours in solvents such as nitrobenzene. Halogen fluorides usually react vigorously with chlorinated hydrocarbons, although IF5 can be dissolved in carbon tetrachloride and BrF3 can be dissolved in Freon 113 without decomposition. All halogen fluorides react explosively with easily oxidizable organic compunds. See FLUORINE. Halogen halides, like halogens, act as Lewis acids and under proper experimental conditions may form a series of stable complexes with various organic electron donors. For example, mixing of carbon tetrachloride solutions of pyridine and of iodine monochloride leads to the formation of a solid complex, C5H5 · ICl. The same reaction can occur with other heterocyclic amines and ICl, IBr, or ICl3. Addition compounds of organic electron donors with IF, IF3, and IF5 have been reported. In all cases it is the
Intermediate-frequency amplifier iodine atom which is directly attached to the donor atom. A number of interhalogen compounds conduct electrical current in the liquid state. Among these are ICl and BrF3. For example, electrical conductance of molten iodine monochloride is comparable to a concentrated aqueous solution of a strong electrolyte (4.52 × 10-3 ohm-1 cm-1 at 30.6◦C or 87.08◦F). The conductances, however, are much smaller than those of fused salts and, therefore, it can be concluded that the bonding in these compounds is largely covalent. Electrical conductance is due to self-ionization reactions, as shown in reactions (1) and (2). + 3ICl I2 Cl + ICl2
−
(1)
+ − 2BrF3 BrF2 + BrF4
(2)
The above behavior leads to the possibility of studying acid-base reactions. In these systems an acid is any compound which generates the solvo-cation, while a base would generate solvo-anions. Thus SbF5 would be an acid in liquid bromine trifluoride, reaction (3), while an electrovalent fluoride would be a base, reaction (4). +
SbF5 + BrF3 BrF2 + SbF6
−
(3)
− + KF + BrF3 K + BrF4
(4)
See SUPERACID. Analogy with acid-base reactions in water is obvious, as shown by reactions (5) and (6). Such relations K+ OH− + H3 O+ Cl− → K+ Cl− + 2H2 O +
−
+
−
K BrF4 + BrF2 SbF6 → KSbF6 + 2BrF3
(5) (6)
have been studied in BrF3, CIF3, and IF5. Numerous salts of the interhalogen acid-base systems have been isolated and studied. Thus, numerous compounds have been formed containing either interhalogen anions (solvo-anions) or interhalogen cations (solvo-cations) simply by adding the appropriate acid or base to a liquid halogen halide or a halogen halide in an appropriate nonaqueous solvent. In addition, cations derived from previously unknown compounds can be prepared by using powerful oxidizing agents, such as KrF+ salts. For example, even though to date, a compound con-
TABLE 2. Known interhalogen anions Threemembered ClF2− BrF2− lCl2− lBr2− lBrCl− BrCl2− l2Cl− Br2Cl− l2Br−
Fivemembered ClF4− BrF4− lF4− lCl4− l2Cl3− l2Cl2Br− l2ClBr2− l4Cl−
Sevenmembered
Ninemembered
ClF6− BrF6− lF6−
lF3−
TABLE 3. Known interhalogen cations Three-membered Cl2F+ ClF2+ BrF2+ ICl2+ I2Cl+ IBr2+ I2Br+ BrCl2+ Br2Cl+ IBrCl+
Five-membered
Seven-membered
CIF4+ BrF4+ IF4+
CIF6+ BrF6+
taining BrF+6 has been prepared according to reaction (7). BrF5 + KrF+ AsF6 − → BrF6 + AsF6 − + Kr
(7)
Pentahalides can also be formed by the addition of an interhalogen compound to a trihalide ion as shown in reaction (8). ICl + ICl2 − → I2 Cl3 −
(8)
A compilation of interhalogen anions and cations which have been previously prepared is given in Tables 2 and 3. Terry Surles Bibliography. V. Gutmann (ed.), MTP International Review of Science: Inorganic Chemistry, Ser. 1, vol. 3, 1975.
Intermediate-frequency amplifier An amplifying circuit in a radio-frequency (RF) receiver that processes and enhances a downconverted or modulated signal. Signal frequency spectrum downconversion is achieved by multiplying the radio-frequency signal by a local oscillator signal in a circuit known as a mixer. This multiplication produces two signals whose frequency content lies about the sum and difference frequencies of the center frequency of the original signal and the oscillator frequency. A variable local oscillator is used in the receiver to hold the difference-signal center frequency constant as the receiver is tuned. The constant frequency of the downconverted signal is called the intermediate frequency (IF), and it is this signal that is processed by the intermediate-frequency amplifier. Unfortunately, radio-frequency signals both higher and lower than the local oscillator frequency by a difference equal to the intermediate frequency will produce the intermediate frequency. One of these is the desired signal; the undesired signal is called an image. See MIXER; OSCILLATOR. Superheterodyne receiver. Aside from demodulation and conversion, the purpose of each stage of a radio receiver is to improve the signal-to-noise ratio (SNR) through a combination of signal amplification and noise/interference suppression. The first stage of a superheterodyne receiver is generally a broadband tuned radio-frequency amplifier intended to improve the signal-to-noise ratio of a selected signal whose
345
346
Intermediate-frequency amplifier frequency lies anywhere within the receiver’s tuning range. The second stage is a variable-frequency mixer or downconverter that shifts the frequency of the signal of interest to the predetermined intermediate frequency by use of a local oscillator (LO). Waveform impurities in the LO signal and nonlinearities in the RF amplifier and mixer may interact with the received signal to generate undesirable false signals that are replicas or images of the received signal but displaced in frequency. The third stage of that receiver is a high-quality intermediate-frequency amplifier that has been optimized to improve the quality of the signal in a very narrow frequency band about the fixed intermediate frequency by greatly attenuating out-of-band noise, adjacent-channel signals, and images. More sophisticated digital intermediatefrequency stages can also reduce broadband interference. Unlike the broadband tunable radio-frequency amplifier, the intermediate-frequency amplifier is designed to operate over a narrow band of frequencies centered about a dedicated fixed frequency (the intermediate frequency); therefore, the intermediatefrequency amplifier can be an extremely efficient stage. If the intermediate frequency is on the order of a few megahertz, the undesirable images may be efficiently rejected, but narrow-band filtering for noise and adjacent-channel-signal rejection is difficult and expensive because of the high ratio of the intermediate frequency to the bandwidth of the intermediatefrequency amplifier. If the intermediate frequency is much smaller, say, on the order of a few hundred kilohertz, then inexpensive and more selective filters are possible that can separate the desired signal from closely packed adjacent signals, but they do not reject images very well. A high-quality doubleconversion receiver combines the best of both approaches by cascading both high- and low-frequency intermediate-frequency stages that are separated by a second fixed-frequency mixer. The last intermediatefrequency stage of a digital or analog superheterodyne receiver is followed by the detector, which extracts the information from the signal. See HETERODYNE PRINCIPLE; RADIO-FREQUENCY AMPLIFIER; RADIO RECEIVER; TELEVISION RECEIVER. Applications. The superheterodyne structure is common for television, ground-based and satellite communications, cell phones, ground-based and airborne radar, navigation, and many other receivers. The intermediate-frequency amplifier function is ubiquitous. Gain and bandwidth. The measure of quality of an intermediate-frequency amplifier is how well it forms a “window” in the frequency spectrum for observing signals of interest and rejecting all other signals (including images) and noise. Maximum gain is usually at the intermediate frequency. The intermediatefrequency-amplifier gain decreases at frequencies both higher and lower than the intermediate frequency. The difference between the upper and lower frequencies at which the signal power drops to onehalf of the maximum signal power (usually at the intermediate frequency) is known as the selectiv-
ity or the 6-dB bandwidth. The difference between the upper and lower frequencies at which the signal power drops to one-thousandth of the maximum signal power is known as the 60-dB bandwidth. The ratio of the 60-dB bandwidth and the 6-dB bandwidth is known as the shape factor, a number that must always be greater than one. A small shape factor indicates good noise rejection because the filter rejects any signal outside of the bandwidth of interest. The best selectivity matches the bandwidth of the signal. Because a single receiver may be used for several different applications, a different intermediatefrequency filter may be employed for each application. The filter requirements vary widely with application. For example, signal bandwidth for broadcast-quality frequency modulation (FM) may be in the tens of kilohertz; for communicationsquality FM, 7–10 kHz; for broadcast-quality amplitude modulation (AM), 5 kHz; for communicationsquality AM, 3 kHz; for single sideband (SSB), 1–2 kHz; and for continuous-wave (CW) communications (radio-telegraphy), under 100 Hz. See BANDWIDTH REQUIREMENTS (COMMUNICATIONS); GAIN; SELECTIVITY. Analog IF amplifiers. Analog (or continuous-time) intermediate-frequency amplifiers consist of narrowband tuned circuits that filter out nearly all signals and noise except the desired signal at the intermediate frequency, and amplification to increase the signal strength. The original tuned-circuit structures were made of capacitors and inductors. Today virtually all high-Q intermediate-frequency filters are only a part of a monolithic integrated circuit that contains the entire radio receiver, including fully integrated RC/op-amp active filters, piezoelectric crystal filters, magnetostrictive lines, and microelectro-mechanical systems (MEMS). See ELECTRIC FILTER; INTEGRATED CIRCUITS; MAGNETOSTRICTION; MICRO-ELECTRO-MECHANICAL SYSTEMS (MEMS); OPERATIONAL AMPLIFIER; PIEZOELECTRICITY; Q (ELECTRICITY). Digital IF amplifiers. This misnomer came to be because the concept is easy to grasp in view of its parallel to analog IF amplifiers. Either digital IF stage or digital IF filter would be a more appropriate name. The current goal of radio-frequency system design is to make an analog-to-digital converter (ADC) directly connected to the antenna to permit realization of an all-digital radio receiver. Fast devices permit analog-to-digital converters that operate at near-terahertz (1012 Hz) speed. In 1999 such devices were still costly and power-hungry, but the same was true of bit-serial near-megahertz devices around 1970. There now exist reasonably priced and powered radio receivers that consist of an analog radiofrequency amplifier and first converter, followed by a high-speed analog-to-digital converter, which is connected to the remainder of the digital-receiver processing, beginning with the intermediate-frequency stage. The digital-receiver function can be produced as a dedicated integrated circuit or as a part of a programmed microprocessor, depending on the frequency range of interest. See MICROPROCESSOR.
Intermediate vector boson
Intermediate vector boson One of the three fundamental particles that transmit the weak force. (An example of a weak interaction process is nuclear beta decay.) These elementary particles—the W+, W−, and Z 0 particles— were discovered in 1983 in very high energy protonantiproton collisions. It is through the exchange of W and Z bosons that two particles interact weakly, just as it is through the exchange of photons that two charged particles interact electromagnetically. The intermediate vector bosons were postulated to exist in the 1960s; however, their large masses prevented their production and study at accelerators until 1983. Their discovery was a key step toward unification of the weak and electromagnetic interactions. See ELECTROWEAK INTERACTION; ELEMENTARY PARTICLE; FUNDAMENTAL INTERACTIONS; WEAK NUCLEAR INTERACTIONS. Production and detection. The W and Z particles are roughly 100 times the mass of a proton. Therefore, the experiment to search for the W and the Z demanded collisions of elementary particles at the highest available center-of-mass energy. Such very high center-of-mass energies capable of producing the massive W and Z particles were achieved with collisions of protons and antiprotons at the laboratory of the European Organization for Nuclear Research ¯ collisions (CERN) near Geneva, Switzerland. The pp were monitored in two underground experimental areas (UA1 and UA2). See PARTICLE ACCELERATOR; PARTICLE DETECTOR.
Properties of W and Z particles. Striking features of both the charged W and the Z 0 particles are their large masses. The charged boson (W + and W−) mass is measured to be about 80 GeV/c2, and the neutral boson (Z 0) mass is measured to be about 91 GeV/c2. (For comparison, the proton has a mass of about 1 GeV/c2.) Prior to the discovery of the W and the Z, particle theorists had met with some success in the unification of the weak and electromagnetic interactions. The electroweak theory as it is understood today is due largely to the work of S. Glashow, S. Weinberg, and A. Salam. Based on low-energy neutrino scattering data, which in this theory involves the exchange of virtual W and Z particles, theorists made predictions for the W and Z masses. The actual measured values are in agreement (within errors) with predictions. The discovery of the W and the Z particles at the predicted masses is an essential confirmation of the electroweak theory, one of the cornerstones of the standard model of particle physics. See STANDARD MODEL. Only a few intermediate vector bosons are produced from 109 proton-antiproton collisions at a center-of-mass energy of 540 GeV. This small pro¯ collision is understood duction probability per pp to be due to the fact that the bosons are produced by a single quark-antiquark annihilation. The other production characteristics of the intermediate vector bosons, such as longitudinal and transverse mo¯ collidmentum distributions (with respect to the pp ing beam axis), provide support for this theoretical picture. See QUARKS. Decay. The decay modes of the W and Z are well predicted. The simple decays W+ → e+ν, W− → e−ν¯ , Z 0 → e+e−, and Z 0 → µ+µ ¯ are spectacular signatures of the intermediate vector bosons (Fig. 1). These leptonic decays are only a few percent of the total number of W and Z decays, however, because the W and Z also decay into
10 events per 4 GeV
A digital intermediate-frequency stage is far more versatile and controllable than its analog counterpart. For example, while changing the selectivity of an analog filter may require switching to different physical signal paths, changing the selectivity of a digital filter requires only the change of stored numerical filter coefficients. Therefore, selective filtering, provided by a digital intermediatefrequency stage, can be more accurate than that of its analog counterpart, absolutely repeatable, and even program-controlled. Such filtering can be used to attenuate out-of-band noise and to sharply reduce in-band interference by use of individual (or a combination of) programmed, adaptive, and manually tuned notch filters. Furthermore, broadband noise can be greatly attenuated by using a digital adaptive noise suppressor. See AMPLIFIER. Stanley A. White Bibliography. R. C. Dorf (ed.), The Electrical Engineering Handbook, 2d ed., 1997; S. Gibilisco, Handbook of Radio and Wireless Technology, 1998; R. F. Graf, Converter and Filter Circuits, 1997; J. E. Paul, Adaptive Noise-Cancelling Receiver, U.S. Patent No. 4, 177, 430; M. S. Roden, Analog and Digital Communications Systems, 4th ed., 2000; M. E. Van Valkenburg (ed.), Reference Data for Engineers: Radio, Electronics, Computer, and Communications, 8th ed., 1996.
5
0
16 32 48 electron transverse energy, GeV
Fig. 1. Observation of the W particle in the UA1 data from the electron transverse energy distribution. The solid curve is the expected distribution for the decay, W → eν, while the broken curve is the distribution expected for the decay of a different particle “X” into an electron and two neutrinos.
347
Intermediate vector boson 40
cross section, 10−37 m2
348
30
20
10
0
86
88
90 92 94 center-of-mass energy, GeV
Fig. 2. Rates from the L3 experiment at LEP for the process e+e− → Z 0 as a function of total center-of-mass energy of the electron plus positron. The rate is measured as a total cross section. (After O. Adriani et al., Results from the L3 Experiment at LEP, Phys. Rep., 236:1–146, 1993)
quark-antiquark pairs. The W particle also has an identifying feature in its decay, which is due to its production through quark-antiquark annihilation. The quark and antiquark carry intrinsic angular momentum (spin) that is conserved in the interaction. This means that the W is polarized at production; its spin has a definite orientation. This intrinsic angular momentum is conserved when the W decays, yielding a preferred direction for the decay electron or positron. This distribution is characteristic of a weak interaction process, and it determines the assignment of the spin quantum number of the W to be 1, as expected from theory. (Spin-1, or more generally integer-spin, particles are called bosons; another example is the photon.) See QUANTUM STATISTICS; SPIN (QUANTUM MECHANICS). Results from electron-positron annihilation. In 1989, the Z 0 particle was produced in electron-positron annihilations at both the Stanford Linear Collider (SLC) and the CERN Large Electron-Positron (LEP) accelerator. The Z 0 was first observed in these experiments by its decay into a quark-antiquark (q¯ q) pair. The quark and antiquark are not observed as free particles; however, the energetic hadrons originating from them give a clear experimental signature. This process, e+e− → Z 0 → q¯ q, is the inverse of the fundamental process in which the Z 0 particle was discovered, q¯ q → Z 0 → e+e−, where the quark and antiquark are constituents of the energetic proton and antiproton. Electron-positron annihilations offer an exceptionally background-free method for the study of the Z 0 particle because it is created alone as a pure state. Thus, the LEP machine with its high collision rate is appropriately referred to as a Z 0 factory. Since the Z 0 particle couples to all known quarks and leptons, the study of its decays also provides information about the properties of all known particles. The mass of the Z particle has been extremely ac-
curately determined at LEP by varying the energy of the electrons and positrons by a few percent and observing the rate of the process e+e− → Z 0 → q¯ q as a function of energy (Fig. 2). The result from four LEP experiments is that mZ = 91.1876 GeV/c2 with an experimental error of 0.0021 GeV/c2. Many details of the decay of Z particles have been measured at LEP with high statistics. All are in agreement with the standard model. The final stages of LEP (LEP II) allowed an increase in machine energy so that pairs of W+ and W− were produced. The W and Z particles have also been extensively studied at the Fermilab Tevatron proton-antiproton collider. Detection of the W particles was the key element in the observation of the top quark at Fermilab. The data from LEP II and Fermilab allowed accurate determination of the W mass. The present value is mW = 80.423 GeV/c2 with an experimental error of 0.039 GeV/c2. The Z0 particle has an extremely short lifetime of about 10−25 s. This is due in part to the large number of channels into which it may decay. The result of this short lifetime is that the Z 0 particle is observed as a resonance having a natural width () of 2.495 GeV/c2 (Fig. 2). There is a partial contribution to this width for each decay channel available to the Z 0. These partial widths can be calculated theoretically. The particles that are known to contribute to the Z 0 width are the electron, muon, and tau leptons (Z 0 → e+e−, µ+µ−, or τ + τ − ); the up, down, strange, charm, and bottom quarks (Z 0 → q¯ q); and the electron-, muon-, and tauneutrinos (Z 0 → ν¯ ν ). There is good agreement between the measured width and the sum of the theoretical partial widths for each known decay channel. The existence of possible new leptons, quarks, or neutrinos with masses less than about mZ/2 is therefore ruled out. See LEPTON; NEUTRINO. Now the focus in particle physics is to understand why the weak interaction mediators (W and Z) are massive while the electromagnetic interaction mediator (photon) is massless, since the two forces are unified. A new proton-proton accelerator, the Large Hadron Collider (LHC), will come into operation in 2008 to address this puzzle. The center-of-mass collision energy of 14 TeV and the high luminosity of 1034 cm−2 s−1 will provide, for the first time, interactions of W and Z particles at energies much larger than the mass energies of the bosons, providing the first window to probe the mechanism by which elementary particles obtain their masses. James W. Rohlf Bibliography. G. Arnison et al., Experimental observation of isolated large transverse energy electrons √ with associated missing energy at s = 540 GeV, Phys. Lett., 122B:103–116, 1983; G. Arnison et al., Experimental observation of lepton pairs of invariant mass around 95 GeV/c2 at the CERN SPS collider, Phys. Lett., 126B:398–410, 1983; The LEP collaborations: Electroweak parameters of the Z 0 resonance and the standard model, Phys. Lett., B276:247–253, 1992; J. W. Rohlf, Modern Physics from α to Z 0, 1994.
Intermetallic compounds weight % nickel
Intermetallic compounds
Al 1800
10 20 30 40 50 60
70
80
90
Ni
1638°C 1600
L 1455°C
1400 1360°C 1200
1362°C
1133° AINi3
temperature, °C
1000
(Ni)
AINi 854°C
800 660.452°C
700°C 639.9°C
400
0 Al
10
20
30
AI3Ni5
(Al)
AI3Ni2
600 AI3Ni
All metal–metal compounds, including ordered, disordered, binary, and multicomponent. The metal– metal aspect of the definition is often relaxed to include some metal–metalloid compounds such as silicides and tellurides and certain compound semiconductors such as InSb. Such inclusion is appropriate since the phenomenology of many of these compounds is similar to the metal-metal ones. Intermetallic compounds constitute an intriguing field for scientific inquiry and a rewarding source of materials for diverse applications. See METAL; METALLOID; SEMICONDUCTOR. Binary systems. Some of the formation characteristics of intermetallics may be appreciated from examination of the phase diagram such as that for the binary system Al-Ni in Fig. 1. Here it is seen that intermetallics can melt congruently (for example, AlNi), and they form by solid–liquid reactions (such as Al3Ni, Al3Ni2, or AlNi3) or interaction between solids (such as Al3Ni5). The phase diagram also shows that intermetallics can exhibit broad ranges of composition, extending on one or both sides of stoichiometry (for example, AlNi), or may show essentially no deviation from stoichiometry (for example, Al3Ni). These same features also appear in ternary and higher-order systems. Stoichiometric deviation implies the presence of point defects (isolated atoms), including substitutions, vacancies, or interstitials. See CRYSTAL DEFECTS; EUTECTICS; NONSTOICHIOMETRIC COMPOUNDS; PHASE EQUILIBRIUM; SOLID-STATE CHEMISTRY; STOICHIOMETRY. Crystal structure. Another characteristic of intermetallic compounds is that their crystal structures are idiosyncratic, not to be inferred from the structures of the component metals. Aluminium and nickel are face-centered-cubic (fcc) metals, but the five intermetallics shown in Fig. 1 have unique structures, none of them the simple fcc of the component metals. In this system, AlNi3 exhibits an ordered variant of the fcc structure, a so-called superstructure (Fig. 2a), but the others are quite different— orthorhombic, trigonal, or ordered body-centeredcubic (bcc). In this particular system, all the intermetallics remain ordered until the melting point is reached, but in other cases (for example, compounds in the Au-Cu system) the ordering disappears well before the melting point. In a few cases, a discrete compound exists in the system over a limited composition range and exhibits no ordering at all (for example, Ag3Al). Some intermetallic crystal structures are quite simple with a low number of atoms per unit cell. For example, AlNi has the CsCl structure (ordered bcc) with only two atoms per unit cell (Fig. 2b), while others, such as NaCd2 with a simple formula, have very complex structures with hundreds of atoms per unit cell. In general, intermetallic crystal structures tend to be high-symmetry, high-space-filling, and high-coordination-number. Although some 2750 intermetallic crystal structures are known, ∼5000 are potentially possible. The 100 most common crystal structures account for about
349
40 50 60 atomic % nickel
70
80
90
100 Ni ◦
Fig. 1. Al-Ni phase diagram. Note that the peritectic formation of AlNi3 lies only 2 above the eutectic between AlNi and AlNi3. (After J. H. Westbrook, ed., Handbook of Binary Phase Diagrams, Genium Publishing, 2005)
72% of all known intermetallics (∼25,000). The particular structure adopted by a pair of metal atoms depends on their relative size, the atomic number factor, the electrochemical factor, the valence electron factor, and the angular valence–orbital factor. The relative importance of these factors varies from case to case. See ATOM; COORDINATION NUMBER; CRYSTAL STRUCTURE; VALENCE. Ternary systems. For ternary systems, the situation is even more complex and little is known. About 15,000 ternary compounds have been identified, but this is only about 3% of the expected number of the possible ternary intermetallics. Ternary compounds are not just of conceptual interest but also of commercial significance, as seen by the important magnetic phase, Nd2Fe14B, and the embrittling G-phase, Ni16Ti6Si7, sometimes encountered in hightemperature alloys. Complications do not end with ternaries. A seven-component intermetallic (Cu11.5Zn23Ga7.5Ge8As16Se30Br4) with a disordered zinc blende structure has been successfully synthesized. Quasicrystals, structures lacking long-range order but exhibiting quasiperiodicity and icosahedral or decagonal structures, and intermetallic clusters (for example, Ba16Na204Sn310) are also intriguing intermetallics. See ALLOY; ALLOY STRUCTURES; QUASICRYSTAL. Formation. Intermetallic compounds can be prepared in various ways, depending on the particular compound and the intended product. Possible methods include melting and casting, powder metallurgy,
350
Intermetallic compounds
Ni3Al, Ni3Si, Zr3Al
AlNi, FeAl
Al, Si
Al
Ni, Zr
Fe, Ni
(a)
(b)
Mg2Si, CoSi2, NiSi2
Cr3Si, Nb3Al
Mg, Si
Cr, Nb
Co (Ni), Si
Si, Al (d)
(c)
Fe3Al Al Fe (e) Fig. 2. General cubic lattices of aluminides and silicides. (a) L12 [cP4], (b) B2 [cP2], (c) C1 [cF12], (d) A15 [cP8], (e) D03 [cF16]. (After K. H. J. Buschow et al., eds., The Encyclopedia of Materials: Science and Technology, vol. 5, p. 4177, Pergamon, 2001)
mechanical alloying, diffusion, nanotech synthesis, and molecular-beam epitaxy. They also can occur naturally as minerals—for example, braithauptite (NiSb), domeykite (Cu3As), froodite (PdBi2), drysdallite (MoSe2), nigglite (PtSn), petzite [(Ag,Au)Te2], kolymite (Cu7Hg6), and hapkeite (Fe2Si), to cite a single example in each family. See DIFFUSION; MECHANICAL ALLOYING; POWDER METALLURGY. Applications. Practical use of intermetallics goes far back in time. In prehistoric times, meteoritic NiFe was used for tools and Cu3Au (tumbaga) for fish-
hooks. In Roman times, CuSn was used for mirrors and Cu3Sn in bronze swords and bells. In the eighteenth and nineteenth centuries, CuZn was used for ship sheathing, SbSn in printing type and bearings, and Ag2Hg3 and Sn8Hg in dental amalgams. Today, intermetallics are used in many high-tech applications, such as Ni3Al in jet engine alloys, FeZn3 in automotive galvanized steel sheet, FeAl as a resistive heating element, GaAs in light-emitting diodes and semiconducting substrates, LaNi5 as a battery electrode, (Tb,Dy)Fe2 as a magnetoelastic transducer, and Ni0.46Ti0.54 as shape memory alloy in the biomedical field. Other specialized applications include protective coatings, catalysts, heat storage materials, thermoelectrics, pyrophoric alloys, and jewelry. The market for intermetallics in the United States has been estimated at $10 billion per year. Other intermetallics. There are a few cases where intermetallics exhibit truly exceptional behavior. LaGa3 is a “plastic” crystal with orientational ordering. Cs4Pb4 undergoes a two-step melting (first loss of rotational order, then loss of positional order). CeSb has the most complex magnetic system known (>17 magnetic structures). Colored intermetallics include CoGa (yellow), CoSi2 (dark blue), AuAl2 (violet), AuLi2Sn (pink), and FeAl (brown), to name some. Pd8U is a thermoelectric with the very large positive thermopower for a metal. SmMn2Ge2 is a magnetoresistor. NaTl is a substance having ionic, covalent, and metallic bonds, all in the same crystal. Mg3Bi2 is a superionic conductor with mobile Mg2+ cations in a metallic framework. And Cu2MnAl is a ferromagnet with no ferromagnetic constituents. J. H. Westbrook Bibliography. F. R. de Boer and D. G. Pettifor (eds.), Cohesion and Structure, North-Holland, 1991; M. Doyama and M. Yabe, Databook of Intermetallic Compounds, Science Forum, Tokyo, 1988; R. D. Dudley and P. D. Desai, Properties of Intermetallic Alloys III, CINDAS, Purdue University, 1995; R. L. Fleischer and J. H. Westbrook, Intermetallic Compounds; Principles and Practice, Wiley, vol. 1, 1996, vol. 2, 1996, vol. 3, 2002; B. P. Gilp and P. D. Desai, Properties of Intermetallic Alloys II, CINDAS, Purdue University, 1994; C. T. Liu, R. W. Cahn, and G. Sauthoff (eds.), Ordered Intermetallics: Physical Metallurgy and Mechanical Behavior, Kluwer, 1992; N. M. Matveeva and E. V. Kozlov, Ordered Phases in Metallic Systems, Nova Science, 1996; National Materials Advisory Board, Intermetallic Alloy Development: A Program Evaluation, NMAB487-1, National Academy Press, 1997; J. E. Payne and P. D. Desai, Properties of Intermetallic Alloys I, CINDAS, Purdue University, 1994; H. Ringpfeil (ed.), Intermetallische Phasen, VEB Deutscher Verlag, Leipzig, 1976; G. Sauthoff, Intermetallics, VCH, 1995; N. S. Stoloffand and V. K. Sikka, Physical Metallurgy and Processing of Intermetallic Compounds, Chapman and Hall, 1996; P. Villars, Pearson’s Handbook: Crystallographic Data for Intermetallic Phases, Desk ed., 2 vols., ASM, 1997; H. Warlimont (ed.), Order-Disorder Transformations in Alloys, Springer, 1974.
Attractive or repulsive interactions that occur between all atoms and molecules. Intermolecular forces become significant at molecular separations of about 1 nanometer or less, but are much weaker than the forces associated with chemical bonding. They are important, however, because they are responsible for many of the physical properties of solids, liquids, and gases. These forces are also largely responsible for the three-dimensional arrangements of biological molecules and polymers. Attractive and repulsive forces. The observation that gases condense into liquids and solids at low temperatures, rather than continuing to occupy all the space available to them, is evidence of a force of attraction between their molecules. This attractive force is overcome only when the thermal energy is high enough. The further observation that liquids and solids occupy a finite volume and cannot be compressed easily is evidence of a repulsive force that comes into play when the molecules are close together. Accordingly, we can deduce that the energy of interaction between two molecules or atoms takes a form like that in Fig. 1, which shows the interaction between argon atoms as deduced from experimental data. Argon atoms are electrically neutral and do not form chemical bonds with each other. When two argon atoms are far apart, the energy of interaction is essentially zero, and the atoms do not exert a force on each other. (The force is the negative gradient, or slope, of the potential energy.) As the two atoms approach, the energy between them decreases, reaching a minimum at a distance of 0.38 nm and then increasing again. At distances greater than 0.38 nm, the energy increases with increasing distance, and the force between the molecules is attractive; at shorter distances, the energy increases rapidly with decreasing distance, and the force becomes strongly repulsive. At the minimum, the interaction energy between the argon atoms is about 1.2 kJ/mol, which may be compared with typical chemical bond energies, of the order of 300 kJ/mol. See ARGON; CHEMICAL BONDING. Description. Intermolecular forces can be classified into several types, of which two are universal (as illustrated by the same example of two interacting argon atoms). The attractive force known as dispersion arises from the quantum-mechanical fluctuation of the electron density around the nucleus of each atom. At long distances—greater than 1 nm or so—the electrons of each atom move independently of the other, and the charge distribution is spherically symmetric (Fig. 2a). At shorter distances, an instantaneous fluctuation of the charge density in one atom can affect the other. If the electrons of one atom move briefly to the side nearer the other (Fig. 2b), the electrons of the other atom are repelled to the far side. In this configuration, both atoms have a small dipole moment, and they attract each other electrostatically. At another moment, the electrons may move the other way, but their motions are cor-
energy, kJ mol−1
Intermolecular forces
351
0.4
0.10
0.2
0.05
0.0
0.0
−0.2
−0.05
−0.4
−0.10
−0.6
−0.14
−0.8
−0.19
−1.0
−0.24
−1.2
−0.29 0
0.1
0.2
0.3 0.4 separation, nm
0.5
0.6
0.7
Fig. 1. Intermolecular potential energy of two argon atoms. (After R. A. Aziz and H. H. Chen, An accurate intermolecular potential for Ar, J. Chem. Phys., 67:5719-5726, 1977)
related so that an attractive force is maintained on average. Molecular orbital theory shows that the electrons of each atom are slightly more likely to be on the side nearer to the other atom, so that each atomic nucleus is attracted by its own electrons in the direction of the other atom. Fritz London demonstrated in 1930 that at long range the dispersion energy is
(a)
(b)
(c) Fig. 2. Schematic diagram of intermolecular interaction. (a) There is negligible interaction between atoms that are 1 nm apart. (b) For atoms separated by about 0.8 nm or less, attractive dispersion forces result from correlated fluctuations of the electron charge distributions of the atoms. (c) For atoms closer than 0.3 nm or so, exchange effects cause a distortion of the charge distribution that leads to repulsion.
energy, kcal mol−1
Intermolecular forces
352
Intermolecular forces proportional to the inverse sixth power of the distance between the atoms. This London dispersion energy is larger for atoms and molecules whose electron densities fluctuate freely—in other words, those atoms and molecules that are easily polarized by external electric fields. See DIPOLE MOMENT; MOLECULAR ORBITAL THEORY. At small separations the electron clouds can overlap, and repulsive forces arise. These forces are described as exchange–repulsion and are a consequence of the Pauli exclusion principle, a quantummechanical effect which prevents electrons from occupying the same region of space simultaneously (Fig. 2c). To accommodate it, electrons are squeezed out from the region between the nuclei, which repel each other as a result. Each element can be assigned, approximately, a characteristic van der Waals radius; that is, when atoms in different molecules approach more closely than the sum of their radii, the repulsion energy increases sharply. It is this effect that gives molecules their characteristic shape, leading to steric effects in chemical reactions. See EXCLUSION PRINCIPLE; STERIC EFFECT (CHEMISTRY). The other important source of intermolecular forces is the electrostatic interaction. When molecules are formed from atoms, electrons flow from electropositive atoms to electronegative ones, so that the atoms become somewhat positively or negatively charged. In addition, the charge distribution of each atom may be distorted by the process of bond formation, leading to atomic dipole and quadrupole moments. The electrostatic interaction between these is an important source of intermolecular forces, especially in polar molecules, but also in molecules that are not normally thought of as highly polar. The interaction between two ethyne (acetylene) molecules, for example, is dominated by the quadrupole moment associated with the carboncarbon triple bond. A further effect is that the electrostatic field of a molecule may cause polarization of its neighbors, and this leads to a further induction contribution to the intermolecular interaction. An important feature is that an induction interaction can often polarize both molecules in such a way as to favor interactions with further molecules, leading to a cooperative network of intermolecular attractions. This effect is important in the network structure of water and ice. See WATER. An example of intermolecular forces is the hydrogen bond, in which a hydrogen atom attached to an electronegative atom such as oxygen or nitrogen is weakly bonded to another electronegative atom in the same molecule or a different molecule. The dominant attractive contribution in a hydrogen bond is the electrostatic interaction, but there are also significant contributions from dispersion and induction. See HYDROGEN BOND. Occurrence. Intermolecular forces are responsible for many of the bulk properties of matter in all its phases. A realistic description of the relationship between pressure, volume, and temperature of a gas must include the effects of attractive and repulsive forces between molecules. The viscosity, diffusion,
and surface tension of liquids are examples of physical properties which depend strongly on intermolecular forces. Intermolecular forces are also responsible for the ordered arrangement of molecules in solids, and account for their elasticity and properties (such as the velocity of sound in materials). The detailed description of the effects of intermolecular forces in condensed phases is complicated, since every molecule interacts not just with its immediate neighbors but with more distant ones too. See DIFFUSION; ELASTICITY; GAS; SURFACE TENSION; VAN DER WAALS EQUATION; VIRIAL EQUATION; VISCOSITY. Intermolecular forces are important between atoms within a molecule. The repulsions between atoms constrain the possible three-dimensional structures that can be taken up by a large molecule such as a polymer or protein. Hydrogen bonds are especially important in biological molecules. The energy associated with a hydrogen bond is typically of the order of 10–20 kJ/mol, so such bonds are much weaker than normal chemical bonds. In a protein, however, there are many hydrogen bonds between the amino acid residues, and they are responsible for the stability of the alpha helix and other elements of protein structure. The twin strands of DNA are composed of adenine, cytosine, guanine, and thymine molecules, connected by a phosphate–sugar backbone. An adenine molecule in one strand can pair only with a thymine in the other, and cytosine can pair only with guanine. This selectivity, which is responsible for the duplication of the genetic code, arises from the pattern of hydrogen bonds. See DEOXYRIBONUCLEIC ACID (DNA); PROTEIN. Atoms and molecules may be held to a solid surface by intermolecular forces. This weak bonding, called physisorption, has many important applications. The trapping of molecules from the gas phase onto cooled surfaces is the basis of pumps for producing high vacuums. Undesirable odors or colors in food or water may sometimes be removed by filters which capture the offending contamination by physisorption. The selective adsorption of molecules by surfaces is a useful method for separation of mixtures of molecules. Some solid catalysts function by attracting molecules from the gas phase onto the surface, where they can react with each other. See ADSORPTION; CATALYSIS; CHROMATOGRAPHY; VACUUM PUMP. Experimental techniques. Because of their importance, intermolecular forces have been studied extensively for many years. Until the early 1970s, most of the information on intermolecular forces was inferred from the study of matter in bulk. For example, measurements of the viscosity of gases or the crystal structure of solids were used for this purpose. However, such properties depend rather indirectly on the intermolecular forces, and the conclusions of such studies were often ambiguous. However, the intermolecular potential energy curve shown in Fig. 1 was derived from experimental data on the viscosity, thermal conductivity, diffusion, and second pressure virial coefficient of argon gas, together with spectroscopic data on the argon dimer, Ar2.
Intermolecular forces H F
H
F
(a) H C H
C
H
C
C (b)
H
O H
H
H
H
O
H
(c)
O H
Fig. 3. Structures of some van der Waals molecules. (a) Hydrogen fluoride dimer. (b) Ethyne dimer. (c) Water trimer; two of the non-hydrogen-bonded H atoms are on one side of the plane of the O atoms, and the third is on the other.
Modern methods study interactions between individual molecules. In molecular-beam collision experiments, low-density streams of atoms or molecules are directed so that individual particles collide. The way in which the molecules rebound from the collision depends on their initial velocities, which can be controlled, and on the forces acting between them. Information about intermolecular forces can be extracted from the experimental data. See MOLECULAR BEAMS. A method which has been increasingly important in recent years, with the development of sophisticated spectroscopic techniques using lasers, involves van der Waals molecules. A van der Waals molecule is a cluster of two or more atoms or molecules in which the molecules retain their individual identity and are held together by intermolecular forces. Such molecules are usually made by expanding a mixture of gases, usually with an inert carrier gas such as argon, into a high vacuum through a narrow nozzle. The resulting molecular beam contains van der Waals molecules, which can be studied spectroscopically. Very detailed information about intermolecular forces can be obtained in this way. In the case of the Ar-HCl cluster, for instance, the interaction energy is now known accurately and in detail as a function of the separation and orientation of the argon and hydrogen chloride components. Other van der Waals molecules have been studied in this way, including clusters of up to six water
molecules, (HF)2, (C2H2)2, and (C2H2)3. The structures of some van der Waals molecules are shown in Fig. 3. The T-shaped structure of ethyne dimer is a consequence of the quadrupole–quadrupole interaction mentioned above, while the bent structure of HF dimer is also attributed to electrostatic forces. Indeed, it has become apparent in recent decades that the electrostatic forces are dominant in determining the structures of van der Waals molecules, primarily because they are repulsive in some orientations and attractive in others, and so are much more orientation-dependent than other intermolecular forces. See LASER. Many of these molecules can rearrange into equivalent but different structures. The HF dimer, for instance, can rearrange in the way shown in Fig. 4, the hydrogen atom in the hydrogen bond moving aside and the other taking its place. The symmetrical intermediate structure is only about 4 kJ/mol higher in energy than the initial and final structures, and tunneling can occur between them, leading to splittings in the spectrum that can be measured. These tunneling splittings provide further information about the nature of the intermolecular potential energy surface. See MOLECULAR STRUCTURE AND SPECTRA. Computational methods. An important source of information about intermolecular potential energy surfaces is quantum theory. The underlying general theory of intermolecular forces has been understood since the 1930s, but only in recent years has it become possible to explore it quantitatively and to understand its finer details. Calculations ab initio, starting from Schr¨ odinger’s equation without any empirical information and using quantum-mechanical perturbation theory, can provide accurate details of all features of the forces between molecules, though this approach is limited to relatively small molecules. In this way, it is possible to obtain a deeper
H F
F
H
(a) H F
F H (b) F
H
F
H (c) Fig. 4. Donor–acceptor exchange in HF dimer.
353
354
Internal combustion engine understanding of the quantum-mechanical origin of intermolecular forces and to calculate them to an accuracy comparable with experiment. See NONRELATIVISTIC QUANTUM THEORY; QUANTUM CHEMISTRY. Anthony J. Stone Bibliography. J. Israelachvili, Intermolecular and Surface Forces, Academic Press, 1992; J. S. Rowlinson, Cohesion: A Scientific History of Intermolecular Forces, Cambridge University Press, 2002; A. J. Stone, The Theory of Intermolecular Forces, Oxford University Press, 1996; D. J. Wales, Energy Landscapes, Cambridge University Press, 2003.
(a)
spark plug
intake port
intake
compression
transfer port
intake port
exhaust port
power
exhaust
exhaust port
Internal combustion engine A prime mover that burns fuel inside the engine, in contrast to an external combustion engine, such as a steam engine, which burns fuel in a separate furnace. See ENGINE. Most internal combustion engines are sparkignition gasoline-fueled piston engines. These are used in automobiles, light- and medium-duty trucks, motorcycles, motorboats, lawn and garden equipment, and light industrial and portable power applications. Diesel engines are used in automobiles, trucks, buses, tractors, earthmoving equipment, as well as marine, power-generating, and heavier industrial and stationary applications. This article describes these types of engines. For other types of internal combustion engines see GAS TURBINE; ROCKET PROPULSION; TURBINE PROPULSION. The aircraft piston engine is fundamentally the same as that used in automobiles but is engineered for light weight and is usually air-cooled. See RECIPROCATING AIRCRAFT ENGINE. Engine types. Characteristics common to all commercially successful internal combustion engines include (1) the compression of air, (2) the raising of air temperature by the combustion of fuel in this air at its elevated pressure, (3) the extraction of work from the heated air by expansion to the initial pressure, and (4) exhaust. Four-stroke cycle. William Barnett first drew attention to the theoretical advantages of combustion under compression in 1838. In 1862 Beau de Rochas published a treatise that emphasized the value of combustion under pressure and a high ratio of expansion for fuel economy; he proposed the four-stroke engine cycle as a means of accomplishing these conditions in a piston engine (Fig. 1). The engine requires two revolutions of the crankshaft to complete one combustion cycle. The first engine to use this cycle successfully was built in 1876 by N. A. Otto. See OTTO CYCLE. Otto’s engine, like almost all internal combustion engines developed at that period, burned coal gas mixed in combustible proportions with air prior to being drawn into the cylinder. The engine load was generally controlled by throttling the quantity of charge taken into the cylinder. Ignition was by a device such as an external flame or an electric spark,
intake (b)
compression
ignition and power
exhaust
Fig. 1. Engine cycles (a) The four strokes of a four-stroke engine cycle. On intake stroke, the intake valve (left) has opened and the piston is moving downward, drawing air and gasoline vapor into the cylinder. On compression stroke, the intake valve has closed and the piston is moving upward, compressing the mixture. On power stroke, the ignition system produces a spark that ignites the mixture. As it burns, high pressure is created, which pushes the piston downward. On exhaust stroke, the exhaust valve (right) has opened and the piston is moving upward, forcing the burned gases from the cylinder. (b) Three-port two-cycle engine. The same action is accomplished without separate valves and in a single rotation of the crankshaft.
so that the timing was controllable. These are essential features of what has become known as the Otto or spark-ignition combustion cycle. Two-stroke cycle. In 1878 Dougald Clerk developed the two-stroke engine cycle by which a similar combustion cycle required only one revolution of the crankshaft. In this cycle, exhaust ports in the cylinder were uncovered by the piston as it approached the end of its power stroke. A second cylinder then pumped a charge of air to the working cylinder through a check valve when the pump pressure exceeded that in the working cylinder. In 1891 Joseph Day simplified the two-stroke engine cycle by using the crankcase to pump the required air. The compression stroke of the working piston draws the fresh combustible charge through a check valve into the crankcase, and the next power stroke of the piston compresses this charge. The piston uncovers the exhaust ports near the end of the power stroke and slightly later uncovers intake ports opposite them to admit the compressed charge from the crankcase. A baffle is usually provided on the piston head of small engines to deflect the charge up one side of the cylinder to scavenge the remaining burned gases down the other side and out the exhaust ports with as little mixing as possible.
250
400 300 200 pressure, psia
pressure, psia
200
150
100
ex co
50 intake 0 (a)
10
mp
ress io
pa
nsio
n
20
100 80 60 40 30 20
n 10 8 6
exhaust 30
volume, in.3
40 (b)
355
end of combustion n n 30 io 1. ans sio n = xp res e mp co 29 1. n=
Modern engines using this two-stroke cycle have a third cylinder port known as the transfer port (Fig. 1b), instead of the crankcase check valve used by Day. Small engines of this type are widely used where fuel economy is not as important as mechanical simplicity and light weight. They do not need mechanically operated valves, and they develop one power impulse per cylinder for each crankshaft revolution. Two-stroke-cycle engines do not develop twice the power of four-stroke-cycle engines with the same size of working cylinders at the same number of revolutions per minute (rpm). Principal reasons are (1) reduction in effective cylinder volume due to the piston movement required to cover exhaust ports; (2) appreciable mixing of burned (exhaust) gases with the combustible mixture; and (3) loss of some combustible mixture through the exhaust ports with the exhaust gases. Otto-cycle engines. In the idealized four-stroke Otto cycle, combustion is instantaneous and at constant volume. This simplifies thermodynamic analysis, but combustion takes time. Gas pressure during the four strokes of the Otto cycle varies with the piston position as shown by the typical indicator card in Fig. 2a. This is a pressure–volume (PV) card for an 8.7:1 compression ratio. Engine power. To simplify calculations of engine power, the average net pressure during the working stroke, known as the mean effective pressure (mep), is frequently used. It may be obtained from the average net height of the card, which is found by measurement of the area and then division of this area by its length. Similar pressure–volume data may be plotted on logarithmic coordinates as in Fig. 2b, which develops expansion and compression relations as approximately straight lines. The slopes show the values of exponent n to use in equations for PV relationships. The rounding of the plots at peak pressure, with the peak developing after the piston has started its power stroke, even with the spark occurring before the piston reaches the end of the compression stroke, is due to the time required for combustion. Changes in design can vary charge turbulence in the compression space prior to and during combustion. The greater the turbulence, the faster the combustion and the lower the antiknock or octane number required of the fuel, or the higher the compression ratio that may be used with a given fuel without knocking. The amount to which the turbulence can be raised is limited by the increased rate of pressure rise, which increases engine roughness. This must not exceed a level acceptable for automobile or other service. See AUTOMOBILE; AUTOMOTIVE ENGINE; COMBUSTION CHAMBER; COMPRESSION RATIO; MEAN EFFECTIVE PRESSURE; OCTANE NUMBER; SPARK KNOCK. Detonation of a small part of the charge in the cylinder, after most of the charge has burned progressively, causes knock. This limits the compression ratio of an engine with a given fuel.
combustion
Internal combustion engine
ignition
exhaust
atmosphere
5 6 8 10
20 30 40
volume, in.3
Fig. 2. Typical Otto-cycle pressure–volume indicator card plotted on (a) rectangular coordinates and (b) logarithmic coordinates.
Compression ratio. According to classical thermodynamic theory, thermal efficiency η of the Otto combustion cycle is given by Eq. (1), where the com-
η =1−
1 rn−1
(1)
pression ratio rc and expansion ratio re are the same (rc = re = r). When theory assumes atmospheric air in the cylinder for extreme simplicity, exponent n is 1.4. Efficiencies calculated on this basis are almost twice as high as measured efficiencies. Logarithmic diagrams from experimental data show that n is about 1.3 (Fig. 2b). Even with this value, efficiencies achieved in practice are less than given by Eq. (1), because it assumes instantaneous combustion and 100% volumetric efficiency. This exponent should vary with the fuel-air mixture ratio, and to some extent with the compression ratio. For an 8:1 compression ratio, the exponent should vary from about 1.28 for a stoichiometric (chemically correct) mixture to about 1.31 for a lean mixture. Actual practice gives even lower thermal efficiencies. This is because of the assumed instantaneous changes in cyclic pressure (during combustion and exhaust) and the disregard of heat losses to the cylinder walls. A change in compression ratio causes little change in the mechanical efficiency, or the volumetric efficiency resulting from raising the compression ratio provides a corresponding increase in torque or mean effective pressure. This is frequently of more practical importance than the actual efficiency increase. See THERMODYNAMIC CYCLE. Engine load has little effect on indicated thermal efficiency, provided the fuel–air ratio remains constant and the ignition time is suitably advanced at reduced loads. This compensates for the slower rate of burning that results from dilution of the combustible charge with the larger percentages of burned gases remaining in the combustion space and the reduced turbulence at lower speeds. High compression improves fuel economy because of improved thermal efficiency. However, the increased peak combustion temperature increases
Internal combustion engine
0
A −20
oc ta B
en t uir em
90
re q
po requ we ire me r nt
30
10
ne
20
octane
100
80 optimum spark advance
0 10 −10 ignition timing
70
octane requirement
2% power loss
% power loss
356
20
Fig. 3. Effects of advancing or retarding ignition timing from optimum on engine power and resulting octane requirement of fuel in an experimental engine with a combustion chamber having typical turbulence (A) and a highly turbulent design (B) with the same compression ◦ ratio. Retarding the spark 7 for 2% power loss reduced octane requirement from 98 to 93 for design A.
emissions of oxides of nitrogen in the exhaust gas. See AIR POLLUTION; SMOG. Ignition timing. High thermal efficiency is obtained from high compression ratios at part loads, where engines normally run at automobile cruising speeds, with optimum spark advance. To avoid knock on available gasolines at wide-open throttle, a reduced or compromise spark advance is used. The tendency of an engine to knock at wide-open throttle is reduced appreciably when the spark timing is reduced 5–10◦ from optimum (Fig. 3). Advancing or retarding the spark timing from optimum results in an increasing loss in mean effective pressure for any normal engine, as shown by the heavy curve in Fig. 3. The octane requirement falls rapidly as the spark timing is retarded, the actual rate depending on the nature of the gasoline as well as on the combustion chamber design. Curves A and B show the effects on a given gasoline of the use of moderate- and high-turbulence combustion chambers, respectively, with the same compression ratio. Because the curve for mean effective pressure is relatively flat near optimum spark advance, retarding the spark for a 1–2% loss is normally acceptable because of the reduction in octane requirement. In addition to the advantages of the higher compression ratio at cruising loads with optimum spark advance, the compromise spark at full load may be advanced toward optimum with higher-octane fuels. Then there is a corresponding increase in full-throttle mean effective pressure. Many automotive engines have an electronic engine control system with an onboard microprocessor that controls spark timing electronically. The microprocessor continuously adjusts ignition timing for optimum fuel economy and drivability, while minimizing exhaust emissions. The microprocessor may be capable of retarding the timing if spark knock occurs. This allows the benefits of using a higher compression ratio and gasoline with a lower octane number without danger of engine-damaging detonation. In some systems, a digital map stored in memory provides a wide range of predetermined ignition
settings (Fig. 4). See CONTROL SYSTEMS; IGNITION SYSTEM; MICROPROCESSOR. Fuel–air ratio. A fuel–air mixture richer than that which develops maximum knock-free mep will permit use of higher compression ratios. However, the benefits derived from compromise or rich mixtures vary so much with mixture temperature and the sensitivity of the octane value of the particular fuel to temperature that this method is not generally practical. Nevertheless, piston-type aircraft engines may use fuel–air mixture ratios of 0.11 or even higher during takeoff, instead of about 0.08, which normally develops maximum mep in the absence of knock. In automotive engines with an electronic engine control system, the microprocessor usually controls the amount of fuel delivered by either a feedback carburetor or a single-point (throttle-body) or a multipoint (port) electronic fuel-injection system. This maintains the mixture at or near the stoichiometric ratio, which minimizes exhaust emissions of hydrocarbons, carbon monoxide, and oxides of nitrogen. However, spark-ignition engines deliver maximum power with an air deficiency of 0–10%, and minimum fuel consumption with about 10% excess air (Fig. 5). Stroke–bore ratio. The ratio of the length of the piston stroke to the diameter of the cylinder bore has no appreciable effect on fuel economy and friction at corresponding piston speeds. Practical advantages that result from the short stroke include the greater rigidity of crankshaft from the shorter crank cheeks, with crankpins sometimes overlapping main bearings, and the narrower as well as lighter cylinder block that is possible. However, the higher rates of crankshaft rotation for an equivalent piston speed necessitate greater valve forces and require stronger valve springs. Also, the smaller depth of the compression space for a given compression ratio increases the surface-to-volume ratio and the proportion of heat lost by radiation during combustion. In automotive engines, stroke–bore ratios have decreased over the years. Valve timing. The times of opening and closing the valves of an engine in relation to piston position are usually selected to develop maximum power over a desired speed range at wide-open throttle. The
crankshaft spark advance
600 500 ma nifo 400 ld mm vac 300 Hg uum, 200
100 0
30 10 6000 4000 t 2000 haf nks rpm a r c ed, spe
Fig. 4. Three-dimensional ignition map showing 576 timing points stored in the memory of a microprocessor system for a four-cylinder automotive engine. 1 mmHg = 133 Pa. (Ford Motor Co.)
200 20
5
10
100
NOX 0
0.8 1.2 1 excess-air factor (λ)
0
0
Fig. 5. Graph showing how the excess-air factor affects exhaust-gas [composition carbon monoxide (CO), nitrogen oxides (NOx), and hydrocarbons (HC)], torque (M), and specific fuel consumption (b) in the part-load range of an automotive spark-ignition engine running at a constant midrange speed and cylinder charge. (Robert Bosch Corp.)
timing of these events is usually expressed as the number of degrees of crankshaft rotation before or after the piston reaches the end of one of its strokes. Because of the time required for the burned gas to flow through the exhaust valve at the end of the power stroke of a piston, the exhaust valve usually starts opening considerably before the end of the stroke. If the valve opens when the piston is nearer the lower end of its stroke, power is lost at high engine speeds because the piston on its exhaust stroke has to move against gas pressure remaining in the cylinder. If the valve opens before necessary, the burned gas is released while it is still at sufficient pressure to increase the work done on the piston. For any engine, there is an optimum time for opening the exhaust valve that will develop the maximum power at some particular speed. The power loss at other speeds does not increase rapidly. Therefore when an engine is throttled at part load, there is less gas to discharge through the exhaust valve and less need for the valve to be opened as early as at wide-open throttle. The timing of intake valve events is normally selected to trap the largest possible quantity of combustible mixture (air in a diesel engine) in the cylinder when the valve closes at some desired engine speed and at wide-open throttle. The intermittent flow through the intake valve undergoes alternate accelerations and decelerations, which require time. During the intake stroke, the mass of air moving through the passage to the intake valve is given velocity energy that may be converted to a slight pressure at the valve when the air mass still in the passage is stopped by its closure. Advantage of this phenomenon may be obtained at some engine speed to increase the air mass which enters the cylinder.
110 105 100
15
300
21 1 /2
CO
30
en g 32 th, i 1 n. /2
HC
el
40
400
ak
10
50
500
int
b
60
357
The engine speed at which the maximum volumetric efficiency is developed varies with the relative valve area, closure time, and other factors, including the diameter and length of the passage. The curves in Fig. 6 show the characteristic falling off at high speeds from the inevitable throttling action as air flows at increased velocities through any restriction such as a valve or intake passage and venturi of a carburetor. Volumetric efficiency has a direct effect on the mean effective pressure developed in a cylinder, on the torque, and on the power that may be realized at a given speed. Since power is a product of speed and torque, the peak power of an engine occurs at a higher speed than for maximum torque, where the rate of torque loss with any further increase in speed will exceed the rate of speed increase. An engine develops maximum power at a speed about twice that for maximum torque. To obtain maximum torque and power, intakevalve closing may be delayed until the piston has traveled almost half the length of the compression stroke. At engine speeds below those where maximum torque is developed by this valve timing, some of the combustible charge that has been drawn into the cylinder on the intake stroke will be driven back through the intake valve before it closes. This reduces the effective compression ratio at wide-open throttle. The engine has an increasing tendency to develop spark knock as the speed and the resulting gas turbulence are reduced. See ENGINE MANIFOLD. Supercharging spark-ignition engines. Volumetric efficiency and thus the mep of a four-stroke sparkignition engine may be increased over a part of or the whole speed range by supplying air to the engine intake at higher than atmospheric pressure. This is usually accomplished by a centrifugal or rotary pump. The indicated power (power developed in the cylinder) of an engine increases directly with the absolute pressure in the intake manifold. Because fuel consumption increases at the same rate, the indicated specific fuel consumption (fuel flow rate per unit
volumetric efficiency, %
M
torque (M ), N–m
excess air, %
15
specific fuel consumption (b), g / kWh
Internal combustion engine
95
11 1 /4
5
90
ake pipe no int
85 80 1000
2000
3000 4000 engine speed, rpm
5000
Fig. 6. Effect of intake-pipe length and engine speed on volumetric efficiency of one cylinder of a six-cylinder engine. 1 in. = 2.5 cm.
6000
Internal combustion engine 550
=5 .0 6. :1 0: 1
su
1400
cr
im
200 150
1
=8 .0 :1
p
ep
1600
0: cr =
300
a ch er
7.
d
e rg
350
=
400
250
1700
cr
450
maximum pressure, psi
1800
500
cr
power output) is generally not altered appreciably by supercharging. The three principal reasons for supercharging four-cycle spark-ignition engines are (1) to lessen the tapering off of mep at higher engine speed; (2) to prevent loss of power due to diminished atmospheric density, as when an airplane (with piston engines) climbs to high altitudes; and (3) to develop more torque at all speeds. In a normal engine characteristic, torque rises as speed increases but falls off at higher speeds because of the throttling effects of such parts of the fuel intake system as valves and carburetors. If a supercharger is installed so as to maintain the volumetric efficiency at the higher speeds without increasing it in the middle-speed range, peak horsepower can be increased. The rapid fall of atmospheric pressure at increased altitudes causes a corresponding decrease in the power of unsupercharged piston-type aircraft engines. For example, at 20,000 ft (6 km) the air density, and thus the absolute manifold pressure and indicated torque of an aircraft engine, would be only about half as great as at sea level. The useful power developed would be still less because of the friction and other mechanical power losses which are not affected appreciably by volumetric efficiency. By the use of superchargers, which are usually of the centrifugal type, sea-level air density may be maintained in the intake manifold up to considerable altitudes. Some aircraft engines drive these superchargers through gearing which may be changed in flight, from about 6.5 to 8.5 times engine speed. The speed change avoids oversupercharging at medium altitudes with corresponding power loss. Supercharged aircraft engines must be throttled at sea level to avoid damage from detonation or excessive overheating caused by the high mep which would otherwise be developed. See SUPERCHARGER. Normally an engine is designed with the highest compression ratio allowable without knock from the fuel expected to be used. This is desirable for the highest attainable mep and fuel economy from an atmospheric air supply. Any increase in the volumetric efficiency of such an engine would cause it to knock unless a fuel of higher octane number were used or the compression ratio were lowered. When the compression ratio is lowered, the knock-limited mep may be raised appreciably by supercharging but at the expense of lowered thermal efficiency. There are engine uses where power is more important than fuel economy, and supercharging becomes a solution. The principle involved is illustrated in Fig. 7 for a given engine. With no supercharge this engine, when using 93-octane fuel, developed an indicated mean effective pressure (imep; an average pressure forcing the piston down the cylinder) of 180 pounds per square inch (psi; 1240 kilopascals) at the borderline of knock at 8:1 compression ratio. If the compression ratio were lowered to 7:1, the mep could be raised by supercharging along the 7:1 curve to 275 imep before it would be knock-limited by the same fuel. With a 5:1 compression ratio it could be
imep, psi (rich mixture)
358
1100 rged
rcha imep unsupe
cr = 9.0:1 100 60 65 70 75 80 85 90 95 100 octane number Fig. 7. Graph showing the relationship between compression ratio and knock-limited imep for given octane numbers, obtained by supercharging a laboratory engine. (After H. R. Ricardo, The High-Speed Combustion Engine, 4th ed., Blackie, 1953)
raised to 435 imep. Thus the imep could be raised until the cylinder became thermally limited by the temperatures of critical parts, particularly of the piston head. Engine balance. Rotating masses such as crank pins and the lower half of a connecting rod may be counterbalanced by weights attached to the crankshaft. The vibration which would result from the reciprocating forces of the pistons and their associated masses is usually minimized or eliminated by the arrangement of cylinders in a multicylinder engine so that the reciprocating forces in one cylinder are neutralized by those in another. Where these forces are in different planes, a corresponding pair of cylinders is required to counteract the resulting rocking couple. If piston motion were truly harmonic, which would require a connecting rod of infinite length, the reciprocating inertia force at each end of the stroke would be as in Eq. (2), where W is the total weight F = 0.000456W N 2 s
(2)
of the reciprocating parts in one cylinder, N is the rpm, and s is the stroke in inches. Both F and W are in pounds. But the piston motion is not simple harmonic because the connecting rod is not infinite in length, and the piston travels more than half its stroke when the crankpin turns 90◦ from firing dead center. This distortion of the true harmonic motion is due to the so-called angularity a of the connecting rod, shown by Eq. (3), where r is the crank radius, a=
s r = l 2l
(3)
s the stroke, and l the connecting rod length, all in inches. Reciprocating inertia forces act in line with the cylinder axis and may be considered as combinations of a primary force—the true harmonic force from Eq. (2)—oscillating at the same frequency as the crankshaft rpm and a secondary force oscillating
Internal combustion engine at twice this frequency having a value of Fa, which is added to the primary at top dead center and subtracted from it at bottom dead center. In general, harmonics above the second order may be disregarded. Therefore, for a connecting rod with the angularity a = 0.291, the inertia force caused by a piston at top dead center is about 1.29 times the pure harmonic force, and at bottom dead center it is about 0.71 times as large. Where two pistons act on one crankpin, with the cylinders in 90◦ V arrangements, the resultant primary force is radial and of constant magnitude, and it rotates around the crankshaft with the crankpin. Therefore, it may be compensated for by an addition to the weight required to counterbalance the centrifugal force of the revolving crankpin and its associated masses. The resultant of the secondary force of the two pistons is 1.41 times as large as for one cylinder, and reciprocates in a horizontal plane through the crankshaft at twice crankshaft speed. In four-cylinder inline engines with crankpins in the same plane, the primary reciprocating forces of the two inner pistons in cylinders 2 and 3 cancel those of the two outer pistons in cylinders 1 and 4, but the secondary forces from all pistons are added. Therefore, they are equivalent to the force resulting from a weight about 4a times the weight of one piston and its share of the connecting rod, oscillating parallel to the piston movement, having the same stroke, but moving at twice the frequency. A large a for this type of engine is advantageous. Where the four cylinders are arranged alternately on each side of a similar crankshaft, and in the same plane, both primary and secondary forces are in balance. Six cylinders in line also balance both primary and secondary forces. V-8 engines use a crank arrangement in which the crankpins are in two planes 90◦ apart. Staggering the crankpins for pistons 1 and 2 90◦ from each other equalizes secondary forces, but the forces are in different planes. The couple this introduces is canceled by an opposite couple from the pistons operating on the crankpins for pistons 3 and 4. Torsion dampers. In addition to vibrational forces from rotating and reciprocating masses, vibration may develop from torsional resonance of the crankshaft at various critical speeds. The longer the shaft for given bearing diameters, the lower the speeds at which these vibrations develop. On automotive engines, such vibrations are dampened by a viscous vibration damper or by a bonded-rubber vibration damper that is similar to a small flywheel coupled to the crankshaft through a rubber ring. The vibration damper may be combined with the pulley for an engine-accessory drive belt. See MECHANICAL VIBRATION. Firing order. The firing order is the sequence in which the cylinders deliver their power impulses to the crankshaft. It is determined by such factors as engine design, ignition intervals, and crankshaft loading. Cylinder arrangements are generally selected for even firing intervals and torque impulses, as well as for balance.
1 4
front
2 3
3
1L − 6R − 3L− 2R − 5L − 4R
1− 3 − 4 − 2 (1− 2 − 4 − 3) 1
4
left 1
1 front
2 5
1
4
6 6
4
5
3 2 right 4 cylinders, horizontal opposed
3
left 2
1− 5 − 3 − 6 − 2 − 4 (1− 4 − 2 − 6 − 3 − 5) 2 − 5 3− 4 (2 − 4) (2 − 5) 6 cylinders, vertical
2
1
1 front
4
1− 6
2
6 cylinders, horizontal opposed
front
2
3
right
1L − 4R − 2R − 3L 1
4
3
2
4 cylinders, vertical left 4
359
3
4
3 right
1L − 4R − 2R − 2L − 3R − 3L − 4L − 1R (1L − 4R − 4L − 2L − 3R − 3L − 2R − 1R)
Fig. 8. Various cylinder arrangements and firing orders.
Figure 8 shows various cylinder arrangements and firing orders that have been used in automotive engines. The Society of Automotive Engineers (SAE) Standard for engine rotation and cylinder numbering provides that standard rotation is counterclockwise rotation of the crankshaft as viewed from the principal output end of the engine. If power can be delivered from either end, rotation shall be as viewed from the flywheel end. Excluding radial engines or those with coplanar cylinder bore axes, cylinders may be numbered by either of two methods: (1) In single or multibank engines, the cylinders are numbered in the sequence in which the connecting rods are mounted along the crankshaft, beginning with the cylinder farthest from the principal output end. (2) In multibank engines, the cylinders may be numbered in sequence in each bank, starting with the cylinder farthest from the principal output end and designated right or left bank by suffixes R and L, such as 1R and 1L. Cylinder bank and accessory locations are described as right or left when the engine is viewed from the flywheel or principal output end. Compression-ignition engines.. In 1897, about 20 years after Otto first ran his engine, Rudolf Diesel successfully demonstrated an entirely different method of igniting fuel. Air is compressed to a pressure high enough for the adiabatic temperature to reach or exceed the ignition temperature of the fuel. Because this temperature is 1000◦F (538◦C) or higher, compression ratios of 12:1 to 23:1 are used commercially with compression pressures from about 440 to 800 psi (3 to 5.5 megapascals). The fuel is injected into the cylinders shortly before the end of the compression stroke, at a time and rate suitable to control the rate of combustion.
8 cylinders, 90°-V
360
Internal energy Compression ratio and combustion. The idealized diesel engine cycle assumes combustion at constant pressure. Like the Otto cycle, thermal efficiency increases with compression ratio, but also varies with the amount of heat added (at the constant pressure) up to the cutoff point where the pressure begins to drop from adiabatic expansion. See DIESEL CYCLE; DIESEL ENGINE. Fuel injection. Early diesel engines used air injection of the fuel to develop extremely fine atomization and a good distribution of the spray. But the need for injection air at pressures of about 1500 psi (10 MPa) required expensive and bulky multistage air compressors and intercoolers. A simpler fuel-injection method was introduced by James McKechnie in 1910. He atomized the fuel as it entered the cylinder by use of high fuel pressure and suitable spray nozzles. After considerable development, it became possible to atomize the fuel sufficiently to minimize the smoky exhaust that had been characteristic of the early airless or solid-injection engines. By 1930, solid injection had become the generally accepted method of injecting fuel in diesel engines. During the 1980s, electronically controlled fuel injection began replacing the mechanical system. Eletronically controlled mechanically actuated unit injectors allowed injection pressure of 22,000 psi (150 MPa). See FUEL INJECTION. Supercharged diesel engines. Combustion in a fourstroke diesel engine is improved by supercharging. Fuels that would smoke heavily and misfire at low loads will burn otherwise satisfactorily with supercharging. The indicated mean effective pressure rises directly with the supercharging pressure, until it is limited by the rate of heat flow from the metal parts surrounding the combustion chamber, and the resulting temperatures. When superchargers of either the centrifugal or positive-displacement type are driven mechanically by the engine, the power required becomes an additional loss to the engine output. There is a degree of supercharge for any engine that develops maximum efficiency. A supercharge that is too high absorbs more power in the supercharger than is gained by the engine, especially at low loads. Another means of driving the supercharger is by an exhaust turbine, which recovers some of the energy that would otherwise be wasted in the exhaust. This may be accomplished with so small an increase of back pressure that little power is lost by the engine. The result is an appreciable increase in efficiency at loads high enough to develop the necessary exhaust pressure. See TURBOCHARGER. Supercharging a two-cycle diesel engine requires some means of restricting or throttling the exhaust to build up cylinder pressure at the start of the compression stroke, and is used on a few large engines. Most medium and large two-stroke diesel engines are usually equipped with blowers to scavenge the cylinders after the working stroke and to supply the air required for the subsequent cycles. These blowers, in contrast to superchargers, do not build up
appreciable pressure in the cylinder at the start of compression. If the capacity of such a blower is greater than the engine displacement, it will scavenge the cylinder of practically all exhaust products, even to the extent of blowing some air out through the exhaust ports. Such blowers, like superchargers, may be driven by the engine or by exhaust turbines. Contrast between diesel and Otto engines. There are many characteristics of the diesel engine which are in direct contrast to those of the Otto engine. The higher the compression ratio of a diesel engine, the less the difficulties with ignition time lag. Too great an ignition lag results in a sudden and undesired pressure rise which causes an audible knock. In contrast to an Otto engine, knock in a diesel engine can be reduced by use of a fuel of higher cetane number, which is equivalent to a lower octane number. See CETANE NUMBER. The larger the cylinder diameter of a diesel engine, the simpler the development of good combustion. In contrast, the smaller the cylinder diameter of the Otto engine, the less the limitation from detonation of the fuel. High intake-air temperature and density materially aid combustion in a diesel engine, especially of fuels having low volatility and high viscosity. Some engines have not performed properly on heavy fuel until provided with a supercharger. The added compression of the supercharger raised the temperature and, what is more important, the density of the combustion air. For an Otto engine, an increase in either the air temperature or density increases the tendency of the engine to knock and therefore reduces the allowable compression ratio. Diesel engines develop increasingly higher indicated thermal efficiency at reduced loads because of leaner fuel-air ratios and earlier cutoff. Such mixture ratios may be leaner than will ignite in an Otto engine. Furthermore, the reduction of load in an Otto engine requires throttling, which develops increasing pumping losses in the intake system. Neil MacCoull; Donald L. Anglin Bibliography. Bosch Automotive Handbook, 1986; W. H. Crouse, Automotive Engines, 8th ed., 1994; J. B. Heywood, Internal Combustion Engine Fundamentals, 1988; L. C. R. Lilly (ed.), Diesel Engine Reference Book, 1984; K. Newton, W. Steeds, and T. K. Garrett, The Motor Vehicle, 1989; Society of Automotive Engineers, SAE Handbook, 4 vols., annually; C. F. Taylor, The Internal Combustion Engine in Theory and Practice, vols. 1 and 2, 1985.
Internal energy A characteristic property of the state of a thermodynamic system, introduced in the first law of thermodynamics. For a static, closed system (no bulk motion, no transfer of matter across its boundaries), the change U in internal energy for a process is equal to the heat Q absorbed by the system from its surroundings minus the work W done by the system on
International Date Line
150°E
165°
180°
ARCTIC OCEAN
150°W
Wrangel l. 70°
Siberia
Alaska
60° BERING SEA
52°30′ N
Aleutian ls. 48°N 45°
fast time (Monday) 11 h fast
International Date Line ◦
The 180 meridian, where each day officially begins and ends. As a person travels eastward, against the apparent movement of the Sun, 1 h is gained for every 15◦ of longitude; traveling westward, time is lost at the same rate. Two people starting from any meridian and traveling around the world in opposite directions at the same speed would have the same time when they meet, but would be 1 day apart in date. If there were no international agreement as to where each day should begin and end, there could be any number of places so designated. To eliminate such confusion, the International Meridian Conference, in 1884, designated the 180◦ meridian as the location for the beginning of each day. Thus, when a traveler goes west across the line, a day is lost; if it is Monday to the east, it will be Tuesday immediately as the traveler crosses the International Date Line. In traveling eastward a day is gained; if it is Monday to the west of the line, it will be Sunday immediately after the line is crossed. An interesting example can be taken from conditions now nearly attainable with jet aircraft. If one could board such a plane, say at noon in Washington, D.C., and fly westward at that latitude around the world to return in 24 h, the rate would match the rotation of the Earth. Although constantly under the noontime Sun, this traveler would need to adjust her calendar 1 day ahead upon crossing the International Date Line, because she would arrive in Washington at noon, 24 h after embarking. Thus her calendar day would agree with that of the Washingtonians. The 180◦ meridian is ideal for serving as the International Date Line (see illus.). It is exactly halfway around the world from the zero, or Greenwich, meridian, from which all longitude is reckoned. It also falls almost in the center of the largest ocean; consequently there is the least amount of inconvenience as regards population centers. A few devi-
165°
slow time (Sunday) 11 h slow
International Date Line
its surroundings. Only a change in internal energy can be measured, not its value for any single state. For a given process, the change in internal energy is fixed by the initial and final states and is independent of the path by which the change in state is accomplished. The internal energy includes the intrinsic energies of the individual molecules of which the system is composed and contributions from the interactions among them. It does not include contributions from the potential energy or kinetic energy of the system as a whole; these changes must be accounted for explicitly in the treatment of flow systems. Because it is more convenient to use an independent variable (the pressure P for the system instead of its volume V), the working equations of practical thermodynamics are usually written in terms of such functions as the enthalpy H = U + PV, instead of the internal energy itself. See CHEMICAL THERMODYNAMICS; ENTHALPY. Paul J. Bender
30°
Hawaiian ls.
Marshall ls.
Gilbert ls.
15°
0°
5°S Ellice ls. 15°30′S
Samoa ls. Cook ls.
15°
Fiji ls. 30°
Tonga ls. New Zealand Chatham ls.
Antipodes 150°E
165°
45°
51°30′S 180°
165°
150°W
The International Date Line.
ations in the alignment have been made, such as swinging the line east around Siberia to keep that area all in the same day, and westward around the Aleutian Islands so that they will be within the same day as the rest of Alaska. Other minor variations for the same purpose have been made near the Fiji Islands, in the South Pacific. See MATHEMATICAL GEOGRAPHY. Van H. English
361
362
Internet
Internet A worldwide system of interconnected computer networks. The Internet has begun to transform the way that communication systems operate. The origins of the Internet can be traced to the creation of ARPANET (Advanced Research Projects Agency Network) as a network of computers under the auspices of the U.S. Department of Defense in 1969. Today, the Internet connects millions of computers around the world in a nonhierarchical manner unprecedented in the history of communications. The Internet is a product of the convergence of media, computers, and telecommunications. It is not merely a technological development but the product of social and political processes as well, involving both the academic world and the government (the Department of Defense). From its origins in a nonindustrial, noncorporate environment and in a purely scientific culture, it has quickly diffused into the world of commerce. While the Internet has had a sudden and dramatic impact on the global economic and social order, it took almost 30 years to emerge as a major technological force (see table). The Internet is a combination of several media technologies and an electronic version of newspapers, magazines, books, catalogs, bulletin boards, and much more. This versatility gives the Internet its power. However, it is difficult to make precise predictions about the success of the Internet because of the complicated relationships it has created among technologies, markets, and political systems. Contributing to the development of the Internet were a wide variety of users who adopted the Internet to communicate among themselves in a semiprivate way. These users were not bound by any customs and practices, but initiated a culture of communication that was flexible, quick, unbounded, and nonhierarchical, where there was no superior authority in control of the communication system and no standards to govern performance. The Internet has grown dramatically since about 1988, although its impact was not felt until about
1995. By mid-2000, estimates of the number of connected computers were around 80 million. The average user gets an Internet connection with the help of an Internet service provider (ISP). An Internet service provider is a commercial organization that provides an Internet dial-up account for the user. In order to utilize the service, the user needs to have a dial-up program, an e-mail program, and a Web browser program. These generally come as a standard package with the purchase of a computer. Technological features. The Internet’s Technological success depends on its principal communication tools, the Transmission Control Protocol (TCP) and the Internet Protocol (IP). They are referred to frequently as TCP/IP. A protocol is an agreed-upon set of conventions that defines the rules of communication. TCP breaks down and reassembles packets, whereas IP is responsible for ensuring that the packets are sent to the right destination. Data travels across the Internet through several levels of networks until it reaches its destination. Email messages arrive at the mail server (similar to the local post office) from a remote personal computer connected by a modem, or a node on a localarea network. From the server, the messages pass through a router, a special-purpose computer ensuring that each message is sent to its correct destination. A message may pass through several networks to reach its destination. Each network has its own router that determines how best to move the message closer to its destination, taking into account the traffic on the network. A message passes from one network to the next, until it arrives at the destination network, from where it can be sent to the recipient, who has a mailbox on that network. See LOCAL-AREA NETWORKS; WIDE-AREA NETWORKS. TCP/IP. TCP/IP is a set of protocols developed to allow cooperating computers to share resources across the networks. The TCP/IP establishes the standards and rules by which messages are sent through the networks. The most important traditional TCP/IP services are file transfer, remote login, and mail transfer.
Early history of the Internet Year
Event
1969
ARPANET (Advanced Research Projects Agency Network) online, connecting four university computer centers using NCP (network control protocol) With 30 host computers in network, public demonstration at a computer conference in Washington, DC; Internetworking Group (INWG) formed to establish standard protocols THEORYNET on line, providing e-mail to over 100 non-defense-contracting computer researchers at the University of Wisconsin UNIX User Network started, to send technical messages, and later adapted for other discussion groups Corporate funding of BITNET, providing e-mail to some academics INWG establishes TCP/IP (Transmission Control Protocol/Internet Protocol) as the standard on ARPANET MILNET online, devoted to operational military nodes formerly on ARPANET Over 1000 host computers connected to the Internet NSFNET (National Science Foundation Network) online; management of regional nets by private firms begins Private networks and for-profit service providers establish links to the Internet for commercial users Over 100,000 host computers connected to the Internet World Wide Web software distributed by the European Laboratory for Particle Physics (CERN) Contract to manage various functions of NSFNET awarded to three corporations; a graphical browser of the Web (MOSAIC) distributed by the National Science Foundation Widespread popularity of the Internet, particularly e-mail and Web, extending to small businesses, homes, and schools around the world Internet becomes truly global
1972 1977 1979 1981 1982 1983 1984 1986 1988 1989 1991 1993 1995 1997
Internet The file transfer protocol (FTP) allows a user on any computer to get files from another computer, or to send files to another computer. Security is handled by requiring the user to specify a user name and password for the other computer. The network terminal protocol (TELNET) allows a user to log in on any other computer on the network. The user starts a remote session by specifying a computer to connect to. From that time until the end of the session, anything the user types is sent to the other computer. Mail transfer allows a user to send messages to users on other computers. Originally, people tended to use only one or two specific computers. They would maintain “mail files” on those machines. The computer mail system is simply a way for a user to add a message to another user’s mail file. Other services have also become important: resource sharing, diskless workstations, computer conferencing, transaction processing, security, multimedia access, and directory services. TCP is responsible for breaking up the message into datagrams, reassembling the datagrams at the other end, resending anything that gets lost, and putting things back in the right order. IP is responsible for routing individual datagrams. The datagrams are individually identified by a unique sequence number to facilitate reassembly in the correct order. The whole process of transmission is done through the use of routers. Routing is the process by which two communication stations find and use the optimum path across any network of any complexity. Routers must support fragmentation, the ability to subdivide received information into smaller units where this is required to match the underlying network technology. Routers operate by recognizing that a particular network number relates to a specific area within the interconnected networks. They keep track of the numbers throughout the entire process. Internet connection speeds. In the conventional system, a stand-alone personal computer accesses the Internet through a dial-up telephone connection via a modem. This provides a temporary connection, known as a SLIP (single-line Internet protocol) or PPP (point-to-point protocol) connection, that enables full access to the Internet as long as the telephone connection is maintained. See MODEM. Cable modem. A more advanced system is via the cable modem, which guarantees Internet service at much higher speeds. For example, a 2-magabyte file, which might take 9 min to download in the conventional system with a 28.8-kilobit-per-second telephone modem, can be downloaded in 10 s using a cable modem. The cable modem is faster because it uses the digital fiber-optic network, which is superior to the available conventional telephone system. Since the cable modem is always connected, there is no need to log in or dial up. However, the cable modem tends to be rather slow at peak hours when a number of subscribers try to send or receive data at the same time. In addition, the number of lines available via cable modem is limited, and is not likely to increase appreciably in the near future, despite a rapidly growing demand for higher speeds.
DSL. The digital subscriber line (DSL) offers a faster Internet connection than a standard dial-up connection or the cable modem connection. DSL uses the existing telephone line and in most cases does not require an additional line. Once installed, the DSL router provides the customer site with a continuous connection to the Internet without any fear that it will cause a busy signal to occur over the telephone line. In its various forms, DSL offers users a choice of speeds ranging from 32 kilobits per second to more than 50 megabits per second. Most Internet service providers offer symmetric DSL data services at a series of speeds so that customers can choose the rate that meets their specific business needs. Over any given link, the maximum DSL speed is determined by the distance between the customer site and the central office. At the customer premises, a DSL router or modem connects the DSL line to a local-area network or an individual computer. DSL takes existing voice cables that connect customer premises to the telephone company’s central office and turns them into a high-speed digital link. ADSL. Asymmetric digital subscriber line (ADSL) is a broadband communication technology designed for use on regular telephone lines. It is called asymmetric because more bandwidth is reserved for receiving data than for sending data. It has the ability to move data over telephone lines at speeds up to 140 times faster than the fastest analog modems available. Domain Name System. The addressing system on the Internet generates IP addresses, which are usually indicated by numbers such as 128.201.86.290. Since such numbers are difficult to remember, a userfriendly system has been created known as the Domain Name System (DNS). This system provides the mnemonic equivalent of a numeric IP address and further ensures that every site on the Internet has a unique address. For example, an Internet address might appear as crito.uci.edu. If this address is accessed through a Web browser, it is referred to as a URL (Uniform Resource Locator), and the full URL will appear as http://www.crito.uci.edu. The Domain Name System divides the Internet into a series of component networks called domains that enable e-mail (and other files) to be sent across the entire Internet. Each site attached to the Internet belongs to one of the domains. Universities, for example, belong to the “edu” domain. Other domains are gov (government), com (commercial organizations), mil (military), net (network service providers), and org (nonprofit organizations). An Internet address is made up of two major parts separated by an @ (at) sign. The first part of the address (to the left of the @ sign) is the username, which usually refers to the person who holds the Internet account and is often the person’s login name or in some way identifies him or her. The second part of the address (to the right of the @ sign) contains the hostname (which can refer to a server on a network), followed by the Internet address, which together identify the specific computer where the person has an Internet e-mail account. For example, the address
[email protected] can be diagrammed as shown in the illustration.
363
364
Internet edu
uxyz
dee jlin
Diagram of the relationship of the sets of computers specified by the Internet address
[email protected].
International domains. Because the Internet has become a global medium, many questions arise as to how the domains are to be managed and controlled, and by whom. Currently, a nongovernmental organization, the Internet Corporation for Assigned Names and Numbers (ICANN), performs a supervisory or advisory function in regard to standards, multilingual sites, domain names, and dispute resolution relative to international domain issues, and in particular, cybersquatting. Generally speaking, cybersquatting refers to the practice of registering a domain name of a well-known trademark in hopes that the owner of the trademark will eventually pay to get it back. Cybersquatting can also be considered an infringement of property right even if it does this unintentionally with the same effect. Since there is no case law or precedents governing Internet maintenance and management at the international level (or for that matter at the national level), many areas are open to question. Besides ICANN, other organizations specialize in various technical and management issues relating to Internet use and dissemination at the global as well as the local level. They work in consultation with ICANN. Communication on the Internet. Electronic mail, or e-mail, may be the most heavily used feature of the Internet. Binary files, such as pictures, videos, sounds, and executable files, can be attached to e-mail messages. Because the Internet is not able to directly handle binary files in e-mail, the file must be encoded in one of a variety of schemes. Popular schemes are MIME and uuencode. The person who receives the attached binary file (called an attachment) must decode the file with the same scheme that was used to encode the file. Many e-mail software packages do this automatically. See ELECTRONIC MAIL. Usenet. This is the world’s largest electronic discussion forum, and it provides a way for messages to be sent among computers across the Internet. People participate in discussions on thousands of topics in specific areas of interest called newsgroups. There are at least 20 major hierarchies of newsgroups, such as recreation (identified by the letters “rec”) and computers (identified by the letters “comp”). Within these major hierarchies are subcategories (such as rec.arts) and further subcategories (such as rec.arts.books). Chats. One of the most immediate ways to communicate with others via the Internet is to participate in
“chats” in which a person holds live keyboard “conversations” with other people on the Internet. A person can hold chats with many people simultaneo0 usly. Internet telephony. It is also possible to make telephone calls on the Internet. This can be done in two ways. A computer and special hardware and software can be used to make calls, so that communication is through the computer. In the second method, often referred to as Internet telephony, the user can make calls in the usual way, except that the call is routed over the Internet rather than the normal telephone service. This usually cuts the cost of making longdistance calls. In fact, the Internet telephone call is free; the user pays only for the Internet connection. Intranets. An intranet is a packet of services that are accessible only within an organization. An intranet works over an organization’s internal network and is set up for a wide variety of purposes, including e-mail, group brainstorming, group scheduling, access to corporate databases and documents, and videoconferencing, as well as buying and selling goods and services. Intranets use TCP/IP networks and technologies as well as Internet resources such as the World Wide Web, e-mail, Telnet, and FTP. An intranet is separated from the rest of the Internet by a firewall, a hardware and software combination that prohibits unauthorized access to the intranet. See COMPUTER SECURITY. Telnet. A remarkable feature of the computer is that it lets an individual use the resources of a distant computer. A person can log onto another computer, issue commands as if he or she were at that computer’s keyboard, and then gain access to all the computer’s resources. This is done with an Internet resource called the Telnet. Telnet follows a client-server model, which allows a piece of software running on the user’s own personal computer (the client) to use the resources of a distant server computer. The host allows many clients to access its resources at the same time; it is not devoted to a single user. See CLIENT-SERVER SYSTEM. Many hosts on the Internet can be accessed by using Telnet. They are all different computers, so many of them do not work or look alike. For example, some might be UNIX-based systems, some might be Windows NT/2000–based computers, some might be Macintoshes, and so on. To connect using Telnet, the client must use terminal emulation, in order to ensure, in essence, that the client’s keyboard and monitor function as the host expects. The most common terminal emulation is the VT-100, which is safe to use. World Wide Web. The World Wide Web (WWW) is based on technology called hypertext. Most of the Web development has taken place at the European Laboratory for Particle Physics (CERN). The Web may be thought of as a very large subset of the Internet, consisting of hypertext and hypermedia documents. A hypertext document is a document that has a reference (or link) to another hypertext document, which may be on the same computer or in a different computer that may be located anywhere in the world. Hypermedia is a similar concept except that
Interplanetary matter it provides links to graphic, sound, and video files in addition to text files. In order for the Web to work, every client must be able to display every document from any server. This is accomplished by imposing a set of standards known as a protocol to govern the way that data are transmitted across the Web. Thus data travel from client to server and back through a protocol known as the HyperText Transfer Protocol (http). In order to access the documents that are transmitted through this protocol, a special program known as a browser is required, which browses the Web. See WORLD WIDE WEB. Commerce on the Internet. Commerce on the Internet is known by a few other names, such as e-business, Etailing (electronic retailing), and e-commerce. The strengths of e-business depend on the strengths of the Internet. The Internet is available practically all over the world, at all times. It is simple to use, and the transaction costs for the end user are low. The costs are also extremely low for the vendors on the Internet, compared to traditional distribution channels. The Internet allows for two-way communication and is built on open standards. The two-way communication, in turn, allows for direct feedback of the customers, and the open standards mean interoperability between companies. It is fairly easy to integrate processes, services, and products, once they have been digitized. Internet commerce is divided into two major segments, business-to-business (B2B) and business-toconsumer (B2C). In each are some companies that have started their businesses on the Internet, and others that have existed previously and are now transitioning into the Internet world. Some products and services, such as books, compact disks (CDs), computer software, and airline tickets, seem to be particularly suited for online business. Alladi Venkatesh Bibliography. U. D. Black, Internet Architecture: An Introduction to IP Protocols, Prentice Hall PTR, Upper Saddle River, NJ, 2000; D. Comer, The Internet Book: Everything You Need To Know about Computer Networking and How the Internet Works, 2d ed., Prentice-Hall, Upper Saddle River, NJ, 3d ed., 2000; H. M. Deitel, P. J. Deitel, and T. R. Nieto, Internet and the World Wide Web: How To Program, Prentice Hall, Upper Saddle River, NJ, 2000; C. Dhawan, Remote Access Networks: PSTN, ISDN, ADSL, Internet and Wireless, McGraw-Hill, New York, 1998; E. A. Hall, Internet Core Protocols: The Definitive Guide, O’Reilly, Cambridge, MA, 2000; D. Minoli and A. Schmidt, Internet Architectures, Wiley, New York, 1999; M. Stefik, The Internet Edge: Social, Legal, and Technological Challenges for a Networked World, MIT Press, Cambridge, MA, 1999.
Interplanetary matter Low-density dust or gas that fills the space in a planetary system around or between the planets. Most interplanetary matter in the inner solar system is dust created by collisions among asteroids or released by
comets as they pass by the Sun. Ionized gas, launched at high speeds from the Sun as the solar wind, also permeates the solar system and creates a variety of important electromagnetic effects where it interacts with planets. Viewed from a nearby star, the interplanetary matter around the Sun would outshine the Earth. Recent detections of large quantities of similar material around other stars—many of which appear to harbor planets—have rekindled widespread interest in this subject. Zodiacal dust. A cloud of dust, called zodiacal dust, fills the plane of the solar system interior to the asteroid belt. An observer on the Earth can see this dust with the unaided eye on some clear moonless nights because it scatters sunlight in the Earth’s direction. It appears as a faint triangle of light in the ecliptic plane, which connects the constellations of the zodiac. Sometimes, backscattered sunlight from the dust can be seen as the Gegenschein, an additional faint patch of light in the antisolar direction. See ZODIACAL LIGHT. Two satellites, the Infrared Astronomical Satellite (IRAS) and the Cosmic Background Explorer (COBE), made detailed maps of the zodiacal cloud as seen from Earth orbit, using detectors that were sensitive at a wide range of infrared wavelengths (1– 200 micrometers). While a human being, who observes light at visible wavelengths, can hardly discern the zodiacal cloud among the stars and the galactic background, instruments in space that sense midinfrared wavelengths (7–30 µm) detect a bright glare from this dust cloud that dominates all other background sources. This glare is thermal emission akin to the glow from a red-hot stove. See HEAT RADIATION; INFRARED ASTRONOMY. Most of this zodiacal dust comes from collisions among asteroids. Prominent families of asteroids, groups of asteroids that appear to have fragmented from a single larger body, create bands of dust that both IRAS and COBE detected. Comets also contribute large amounts of dust as they pass near the Sun. IRAS and COBE detected trails of dust left by individual comets, mixing into the background cloud. See ASTEROID; COMET. The cloud dust takes the shape of a disk whose thickness is proportional to the distance from the Sun. At the asteroid belt, most of the dust sits closer than 0.2 astronomical unit from the midplane of the solar system. (One astronomical unit, or AU, equals the mean distance from the Earth to the Sun, 1.496 × 108 km or 9.3 × 107 mi.) Zodiacal dust has been measured within a few solar radii from the Sun in the F corona, where it reaches sublimation temperatures. In between the asteroid belt and the F corona, the number density of the particles falls off as r−1.3, where r is the distance from the Sun. See SUN. The sizes of the particles lie on a continuum from 1-µm-diameter dust particles to the size of asteroids, the largest of which is 950 km (590 mi) in diameter. The larger objects contain most of the interplanetary mass, but the smaller objects combined have most of the surface area, and so emit and reflect the most light. Near the Earth, the density of the cloud is about 3 × 10−23 g cm−3.
365
366
Interplanetary matter A parameter called β, the ratio of the magnitude of the radiation pressure force on a particle to the magnitude of the Sun’s gravitational force on the particle, determines the dynamical behavior of each particle. In general, small particles, which have a high massto-surface area ratio, have high β values. Particles smaller than about 1 µm tend to have β values near 1; radiation pressure rapidly blows them out of the solar system. These tiny particles that stream away from the Sun are called β-meteoroids. See RADIATION PRESSURE. Solar photons add energy to the orbits of dust grains, but because the radiation pressure from these photons is outward, they do not add angular momentum. When the particles reradiate this energy, however, the radiation is beamed in the direction of the particle’s motion, according to special relativity. The reaction to the force of the beamed radiation, the Poynting-Robertson drag, removes angular momentum from the orbit. Due to Poynting-Robertson drag, zodiacal dust with β somewhat less than 1 spirals in from the asteroid belt, reaching the Sun in roughly 10,000 years for a 10-µm particle, 10 times as long for a 100-µm particle. In general, because of the effects of radiation pressure and Poynting-Robertson drag, the steady presence of large amounts of dust in a planetary system implies the existence of a population of larger orbiting bodies to supply it. See POYNTING’S VECTOR. The shape and composition of zodiacal dust particles vary depending on their origin, and none is homogeneous and spherical. However, these particles fly so fast through the inner solar system, up to 30 km s−1 (20 mi/s), that any attempt to catch them for close study destroys many particles completely, and biases the sample. One such biased sample, micrometeoroids collected at high altitudes in the Earth’s atmosphere, consists of small particles that mostly resemble carbonaceous chondrites, the most primitive meteorites. They are aggregates of micrometer-sized silicate grains, with roughly solar composition, minus the volatile gases, plus some extra carbon and water. Particles that originate from comets are thought to be fluffy and porous, like a bird’s nest; the spaces in the structure are where ice used to be before the comet neared the Sun and began to sublimate. Interplanetary gas: the solar wind. The inner solar system also contains some interplanetary gas, or more specifically, plasma, most of which comes from the Sun in the form of the solar wind. The solar wind flows supersonically in all directions out of the corona of the Sun at typical speeds of 450 km s−1 (280 mi/s) and temperatures of 100,000 K. Like the corona, the solar wind consists mostly of ionized hydrogen. At the Earth’s orbit, the density of the wind is roughly 5 particles/cm3 (80 particles/in.3), or about 8 × 10−24 g cm−3. See SOLAR WIND. Although the wind is electrically neutral, it supports a magnetic field with a strength of about 5 nanotesla (5 × 10−5 gauss) at the orbit of the Earth, which flows outward from the Sun along with the gas. Due to the rotation of the Sun, the path of the solar wind takes the shape of an archimedean spiral,
and the magnetic field lines stretch so that they are locally parallel to this spiral. Interaction with the solar wind charges interplanetary dust particles to typical potentials of +5 volts. This charge allows the magnetic field carried by the wind to entrain the smallest β-meteoroids. This magnetic field also protects the solar system from low-energy galactic cosmic rays. The zone near the Earth where the planet’s magnetic field dominates the field in the solar wind is the Earth’s magnetosphere. Mercury, Jupiter, Saturn, Uranus, and Neptune also have strong magnetic fields, and therefore large magnetospheres. The fields of these planets punch holes in the solar wind, and the solar wind sweeps these magnetospheres into long magnetotails stretching away from the Sun, removing ionized hydrogen and other gases from the atmospheres of these planets in the process. The interaction of the Earth’s magnetic field and the solar wind causes the aurorae. See AURORA; JUPITER; MAGNETOSPHERE; MERCURY (PLANET); NEPTUNE; PLANETARY PHYSICS; SATURN; URANUS. Comets also release gas, which sublimates from ice on their surfaces when they are near the Sun. This gas is mostly water and water-derived molecules such as OH and H3O+. However, in situ measurements of Comet Halley with the Giotto space probe suggest that comets may release up to six times as much mass in solids as in gas. The solar wind soon drags the gas released by comets into long plasma tails and finally pushes it out of the solar system. The solar wind flows all the way out to about 50 AU, where the thermal pressure of the interstellar gas which surrounds the solar system overcomes the pressure of the solar wind, which is mostly ram pressure due to its high velocity flow. This interaction occurs at a shock front called the heliopause. Many other stars have winds and other outflows; these phenomena are not associated with the existence of planets. See INTERSTELLAR MATTER. Outer solar system. The region between the asteroid belt and the orbit of Neptune appears to be relatively devoid of interplanetary dust. The Pioneer 10 and 11 spacecraft and the Ulysses space probe found two minor populations of dust particles in this zone: the β-meteoroids ejected from the inner solar system, and a flow of interstellar dust grains on hyperbolic orbits. Beyond Neptune, a ring of cometsized icy bodies, called the Kuiper Belt, orbits the Sun. Undoubtedly, Kuiper Belt Objects collide with one another and produce a disklike cloud of dust. Neptune’s gravitational effects are expected to confine most of this dust to the region beyond 30 AU from the Sun. Dust so far from the Sun is cold, may be 50 K or less, so that it radiates most of its thermal energy at far-infrared wavelengths (30–200 µm). The low temperature makes the cloud too dim to detect from near the Earth, even though it may be just as massive as the zodiacal cloud. See KUIPER BELT; SPACE PROBE. Other planetary systems. Other stars have circumstellar clouds of dust that may be analogous to the interplanetary dust in the solar system. The Spitzer satellite and other telescopes have imaged dusty disks around many nearby sunlike stars. Some of
Interpolation these stars are known to host planets as well as dust; the dust we see is interplanetary matter in these extrasolar planetary systems. Surveys with the Spitzer Space Telescope have found many more examples of this phenomenon since its launch in 2003. The strong infrared emission is thought to come from starlight absorbed and reradiated by large quantities of circumstellar dust. Circumstellar dust clouds around stars older than roughly 10 million years are often referred to as debris disks. Several stars in this age group with strong infrared emission, notably Beta Pictoris, AU Microscopii, HR 4796, Fomalhaut, Vega, Epsilon Eridani, and Tau Ceti, appear to be surrounded by rings or disks. These disks are thought to represent the “debris” left-over from the process of planet formation. Some debris disks appear to harbor planetary systems. Some have planets that have been detected by close monitoring of the radial velocity of the star to look for the wobble caused by gravitational pull of an orbiting body. Other debris disks have planetary systems inferred from the structure asymmetries of the disks themselves. The debris disk around the star Beta Pictoris has been found to contain small amounts of gas, possibly arising from cometlike bodies. See EXTRASOLAR PLANETS. The dust in the solar system emits and reflects at least six times as much light as the Earth. However, circumstellar dust around other stars is much more difficult to see than dust in the solar system. Known debris disks around other stars typically have hundreds or thousands of times as much dust as the solar system; less massive disks would be too faint to detect. Imaging disks often requires a special observational tool, like a coronagraph or an interferometer. Debris disks often have relatively little dust interior to a distance of roughly 10 AU from their host stars. When a debris disk contains dust orbiting within a few astronomical units of the star, this dust is called exozodiacal dust, in analogy with the zodiacal dust in the solar system. Exozodiacal dust is considered one of the chief obstacles in the search for extrasolar planets that resemble the Earth. Planets and interplanetary dust. The planets in the solar system interact with interplanetary dust in several important ways. For one, they can directly intercept some of the particles. Some 400 metric tons (4 × 105 kg, or 8 × 105 pounds-mass) of meteoritic material fall on the Earth every day. Objects more massive than about 1 kg (2.2 lbm) have enough material to survive the passage through the Earth’s atmosphere and reach the ground; these are called meteorites. Particles less massive than 10−10 g (2 × 10−13 lbm), smaller than about 1 µm, can also survive the fall because they slow down so fast; these are called micrometeorites. Meteoroids larger than 1 g (2 × 10−13 lbm) produce visible trails as they disintegrate overhead; these are shooting stars (meteors). See METEOR; METEORITE; MICROMETEORITE. Planets also affect interplanetary dust grains indirectly through their gravitational perturbations. For example, as zodiacal dust spirals past the Earth on its way into the Sun, some of the grains become trapped in resonant orbits with the Earth. If the number of
years it takes for the particle to orbit the Sun is approximately a ratio of two small integers (like 3/2 or 5/3), the slight gravitational tugs the particle feels when it passes near the Earth tend to add up, reinforcing each other. The sum of all these tugs can temporarily counteract Poynting-Robertson drag and halt the dust grain’s inward spiral. This process creates a ring of enhanced dust density around the Sun at a distance of roughly 1 AU. The IRAS and COBE missions detected both this ring and an asymmetry in the ring, a wake of further density enhancement trailing the Earth in its orbit. No existing telescope can directly image a planet around another star. But the large-scale dynamical effects of a planet on a cloud of circumstellar dust are often easier to detect from afar than the planets themselves. For example, resonant phenomena like the Earth’s dust ring may cause the asymmetries seen in maps of the dust around the nearby star Vega. Large planets may maintain some of the central holes in disks debris, just as Neptune probably prevents most dust generated in the Kuiper Belt from spiraling past it toward the Sun. Marc J. Kuchner Bibliography. E. Gr¨ un et al. (eds.), Interplanetary Dust, 2001; A. C. Levasseur-Regourd and H. Hasegawa (eds.), Origin and Evolution of Interplanetary Dust, 1991.
Interplanetary propulsion Means of providing propulsive power for flight to the Moon or to a planet. A variety of different propulsion systems can be used. The space vehicles for these missions consist of a series of separate stages, each with its own set of propulsion systems. When the propellants of a given stage have been expended, the stage is jettisoned to prevent its mass from needlessly adding to the inertia of the vehicle. See ROCKET STAGING. By expelling mass at high velocities, the propulsion systems provide impulse which permits the vehicle to execute its flight trajectory and the necessary maneuvers. Interplanetary or lunar trajectories, as well as the associated maneuvers, are complex, and different kinds of propulsion systems are desirable to meet the requirements of the various phases of flight. Although all propulsion systems actually used in interplanetary flight have been chemical rockets, several basically different propulsion systems have been studied. See ION PROPULSION; ROCKET PROPULSION; SPACE FLIGHT; SPACE NAVIGATION AND GUIDANCE; SPACE PROBE; SPACE TECHNOLOGY; SPACECRAFT PROPULSION; SPACECRAFT STRUCTURE. George P. Sutton
Interpolation A process in mathematics used to estimate an intermediate value of one (dependent) variable which is a function of a second (independent) variable when values of the dependent variable corresponding to
367
368
Interpolation several discrete values of the independent variable are known. Suppose, as is often the case, that it is desired to describe graphically the results of an experiment in which some quantity Q is measured, for example, the electrical resistance of a wire, for each of a set of N values of a second variable υ representing, perhaps, the temperature of the wire. Let the numbers Qi, i = 1, 2, . . . , N, be the measurements made of Q and the numbers υ i be those of the variable υ. These numbers representing the raw data from the experiment are usually given in the form of a table with each Qi listed opposite the corresponding υ i. The problem of interpolation is to use the above discrete data to predict the value of Q corresponding to any value of υ lying between the above υ i. If the value of υ is permitted to lie outside these υ i, the somewhat more risky process of extrapolation is used. See EXTRAPOLATION. Graphical interpolation. The above experimental data may be expressed in graphical forms by plotting a point on a sheet of paper for each pair of values (υ i, Qi) of the variables. One establishes suitable scales by letting 1 in. represent a given number of units of υ and of Q. If υ is considered the independent variable, the horizontal displacement of the ith point usually represents υ i and its vertical displacement represents Qi. If, for simplicity, it is assumed that the experimental errors in the data can be neglected, then the problem of interpolation becomes that of drawing a curve through the N data points Pi having coordinates (xi, yi) that are proportional to the numbers υ i and Qi, respectively, so as to give an accurate prediction of the value Q for all intermediate values of υ. Since it is at once clear that the N measurements made would be consistent with any curve passing through the points, some additional assumptions are necessary in order to justify drawing any particular curve through the points. Usually one assumes that the υ i are close enough together that a smooth curve with as simple a variation as possible should be drawn through the points. In practice the numbers υ i and Qi will contain some experimental error, and, therefore, one should not require that the curve pass exactly through the points. The greater the experimental uncertainty the farther one can expect the true curve to deviate from the individual points. In some cases one uses the points only to suggest the type of simple curve to be drawn and then adjusts this type of curve to pass as near the individual points as possible. This may be done by choosing a function that contains a few arbitrary parameters that may be so adjusted as to make the plot of the function miss the points by as small a margin as possible. For a more complete discussion of this topic see CURVE FITTING. For many purposes, however, one uses a French curve and orients it so that one of its edges passes very near a group of the points. Having drawn in this portion of the curve, one moves the edge of the French curve so as to approximate the next group of points. An attempt is made to join these portions of
the curve so that there is no discontinuity of slope or curvature at any point on the curve. Tabular interpolation. This includes methods for finding from a table the values of the dependent variable for intermediate values of the independent variable. Its purpose is the same as graphical interpolation, but one seeks a formula for calculating the value of the dependent variable rather than relying on a measurement of the ordinate of a curve. In this discussion it will be assumed that xi and yi (i = 1, 2, . . . , N), which represent tabulated values of the independent and dependent variables, respectively, are accurate to the full number of figures given. Interpolation then involves finding an interpolating function P(x) satisfying the requirement that, to the number of figures given, the plot of Eq. (1) pass y = P(x)
(1)
through a selected number of points of the set having coordinates (xi, yi). The interpolating function P(x) should be of such a form that it is made to pass readily through the selected points and is easily calculated for any intermediate value of x. Since many schemes are known for determining quickly the unique nth degree polynomial that satisfies Eq. (1) at any n +1 of the tabulated values and since the value of such a polynomial may be computed using only n multiplications and n additions, polynomials are the most common form of interpolating function. If the subscripts on x and y are reassigned so that the points through which Eq. (1) passes are now (x0, y0), (x1, y1), . . . , (xn, yn), the polynomial needed in Eq. (1) may be written down by inspection, and one has Eq. (2), where Eq. (3) applies. Equation (3) is y= Lk (x) =
n Lk (x) yk L (x ) k=0 k k
(x − x0 )(x − x1 ) · · · (x − xn ) (x − xk )
(2)
(3)
Lagrange’s interpolation formula for unequally spaced ordinates. Since Lk(x) vanishes for all xs in the set x0, x1, . . . , xn except xk, substituting x = xs in the right-hand side of Eq. (1) gives rise to only one nonzero term. This term has the value ys, as required. For n = 1 Eqs. (2) and (3) give rise to Eq. (4), y=
x − x0 x − x1 y0 + y1 x0 − x1 x − x1
(4)
whose plot is a straight line connecting the points (x0, y0) and (x1, y1). Such an interpolation is referred to as linear interpolation and is used in all elementary discussions of interpolation; however, another equivalent form of this equation, given below in Eq. (12), is more often used in these cases. Suppose the table were obtained from the equation y = f(x), in which f(x) is some mathematical function having continuous derivatives of all order up to and including the (n + 1)th. It is then possible to obtain an accurate expression for intermediate values of f(x) by adding to the right-hand side of Eq. (2)
Interpolation a so-called remainder term, expression (5), where (x − x0 )(x − x1 ) · · · (x − xn ) (n+1) (ξ ) f (n + 1)!
y=
Ak (u)yk = A−p (u)y−p + · · · (6)
terpolating polynomial that now passes through the n + 1 points (x−p, y−p), . . . , (xn−p, yn−p). Here p is the largest integer less than or equal to n/2, and the A−p(u), . . . , An − p(u) are polynomials in the variable shown as Eq. (7). The polynomials of the variable in x − x0 h
u=
(7)
Eq. (7) have been tabulated as functions of u. Inverse interpolation. If the value of y is known and the value of the corresponding independent variable x is desired, one has the problem of inverse interpolation. Since the polynomials Ak(u) in Eq. (6) are known functions of u, and the values of y and yk are also known, the only unknown in this equation is u. Thus the problem reduces to that of finding a real root u of an nth-degree polynomial in the range 0 < u < 1. Having found u, one may find x from Eq. (7). For a discussion of numerical methods for solving for such a root and for more information on interpolation see NUMERICAL ANALYSIS. One may also perform an inverse interpolation by treating x as a function of y. Since, however, the intervals between the yi are not equal it is necessary to employ the general interpolation formula of Eq. (2) with the xs and ys interchanged. Round-off errors. In the tabulated values, round-off errors yi resulting from the need to express the entries yi of the table as finite decimals will cause an additional error in the interpolated value of y that must be added to the truncation error discussed before. The effect of these errors on the application of Lagrange’s interpolation formula is seen by Eq. (6) to be a total error T in y given by Eq. (8). Letting e Ak (u)εk (8) εT = k
be the smallest positive number satisfying the condition e > | k| for all k, one knows from Eq. (8) that relation (9) holds. Since the sum of the Ak(u) is equal |εT | ≤ |Ak (u) εk ≤ e |Ak (u)| (9) k
k
|Ak (u)|
k
in Eq. (9) is usually not much larger than 2 or 3 and thus the interpolated value of y has about the same round-off error as the individual entries. Use of finite differences. For some purposes it is more convenient to use an interpolating formula based not so much on the entries yi of a central difference table as upon their differences: x−2
y−2
x−1
y−1
x0
y0
x1
y1
x2
y2
x3
y3
k=−p
+ A0 (u)y0 + · · · + An−p (u)yn−p
(5)
f (n+1)(ξ ) is the (n + 1)th derivative of f(x) at some point x = ξ lying between the smallest and largest of the values x0, x1, . . . , xn. Since the value of ξ is not known, the remainder term is used merely to set an upper limit on the truncation error introduced by using Lagrange’s interpolation formula. If the ordinates are equally spaced, that is, xs = x0 + sh where h is the interval of tabulation, Lagrange’s interpolation formula simplifies considerably and may be written as Eq. (6), where n is the degree of inn−p
to 1, the factor
δy − 3/2
δ 2y −1
δy − 1/2
δ 2y0
δy 1/2
δ 2y1
δy 3/2
δ 2y2
δy 5/2
δ 3y −1/2 δ 3y1/2
δ 4y0 δ 4y1
δ 3y3/2
Each difference δ kys is obtained by subtracting the quantity immediately above and to the left of it from the quantity immediately below and to the left; thus, Eq. (10) can be written, where k and 2s δ k ys = δ k−1 ys+1/2 − δ k−1 ys−1/2 .
(10)
are required to be integers. For example, δy1/2 = y1 − y0 and δ 2 y0 = δy1/2 − δy1/2 . An interesting property of a difference table is that, if y, the dependent variable tabulated, is a polynomial of the nth degree in x, its kth difference column will represent a polynomial of degree n − k. In particular, its nth differences will all be equal and all higher differences will be zero. For example, consider a table of cubes and the difference table formed from it by the rule given above: χ
y = χ3
0
0
1
1
1 6 7 2
6
19 3
6 6
64
0
24 6
61 5
0
18
27 37
4
0
12
8
125
30 91
6
216
Most functions f(x), when tabulated at a small enough interval x = h, behave approximately as polynomials and therefore give rise to a difference table in which some order of difference is nearly constant. Consider, for example, the difference table of log x, in which the third differences fluctuate between 7 and 9 times 10−7: Experimental data, if taken at small enough interval of the independent variable, would be expected
369
370
Interpolation x
y = log x
1.00
0.0000 000
1.01
0.0043 214
δy 43 214 42 788
1.02
0.0086 002 42 370
1.03
0.0128 372
1.04
0.0170 333
1.05
0.0211 893
1.06
0.0253 059
41 961 41 560 41 166
δ2y
Expressed in a more general form Eqs. (13) become Eqs. (14). If one sets the second difference equal to
δ3y
δy1/2 = y1 − y0
−426 −418 −409 −401 −394
δ 2 y0 = y1 − 2y0 + y−1 h h −f x− δ f (x) = f x + 2 2 2 δ f (x) = f (x + h) − 2f (x) + f (x − h)
8 9 8 7
to exhibit much the same behavior as a mathematical function except for the presence of experimental error. The presence of the latter will cause the differences to have a more or less random fluctuation. The size of the fluctuation may, in fact, be used to indicate the number of significant figures in the data. The constancy of the third differences for log x indicates that for the accuracy and the interval used, a third-degree polynomial may be employed as an interpolating function. Since such a polynomial is determined by the choice of four coefficients, one would expect the interpolation formula to involve four numbers derivable from the difference table. Thus the forward-interpolation formula of Gauss, Eq. (11), can be written. If terminated after the fourth y = y0 + uδ y1/2 +
1 u(u − 1)δ 2 y0 2
1 u(u2 − 1)δ 3 y1/2 3! 1 + u(u2 − 1)(u − 2)δ 4 y0 4! 1 + u(u2 − 1)(u2 − 4)δ 5 y1/2 + · · · (11) 5! +
term, it represents a third-degree polynomial in u = (x − x0)/h, and hence in x. It involves the four constants y0, δy1/2, δ 2y0, and δ 3y1/2. Since any one of the entries in the y column may be chosen as y0, the differences required are picked from a central difference table, for example, in relationship to this entry. The interpolating polynomial obtained passes through the four points (x−1, y−1), (x0, y0), (x1, x2), (x2, y2). In general, the interpolating polynomial will pass through only those points whose y coordinate is needed to form the differences used in the formula. If one terminates the series in Eq. (11) after the second term, one obtains Eq. (12). This is the linear y = y0 + uδy1/2 = y0 + u(y1 − y0 )
(12)
interpolation formula most often used when making a simple interpolation in a table. There are a great variety of interpolation formulas, such as Gregory-Newton’s, Stirling’s, and Bessel’s, that differ mainly in the choice of differences used to specify the interpolating polynomial. Difference equations. Repeated application of Eq. (10) may be used to express any difference in terms of the tabulated values, for example, Eqs. (13).
(13)
(14)
zero, one obtains a so-called difference equation for f(x). In general, a difference equation is any equation relating the values of f(x) at discrete values of x. Difference equations play much the same role in analytical work as differential equations. Because they can be interpreted in terms of only those values of f(x) tabulated at some interval x = h, they are admirably adapted to numerical computations. For this reason most numerical solutions of differential equations involve approximating the equation by a suitable difference equation. For ordinary differential equations the transformation to a difference equation can be made by replacing each derivative in the equation by the appropriate difference expression according to formula (15), 1 f (2k) (x) → 2k δ 2k f (x) f (2k+1) (x) → h h h 1 2k+1 2k+1 f x + f x − + δ (15) δ 2h2k+1 2 2 where f (n)(x) designates the nth derivative of f(x). The difference equation resulting can then, as mentioned before, be used to express the relationship between the values of f(x) at the discrete points xs = x0 + sh, s = 0, 1, 2, . . . , n. Partial difference equations. Suppose one chooses to specify a function of two variables f(x, y) by giving its value at some regular array of points in the xy plane having coordinates (xm, yn. Then, in place of a linear partial differential equation for f(x, y) one has a linear partial difference equation, Eq. (16), where As f (xi+s , y j+t ) = g(xi , y j) (16) s, t
g(x, y) is a known function and i and j any of the set of integers for which the difference equation has significance. A difference equation in which some of the f(xi+s, yj+t) occur to a power other than the first is termed a nonlinear difference equation. If one employs a square lattice makeup of the points, Eqs. (17), then Laplace’s differential equation is approximated by difference equation (18), where, for simplicity, Eq. (19) holds. xm = x0 + mh
yn = y0 + nh
(17)
fi+1, j + fi, j+1 + fi−1, j + fi, j−1 − 4fij = 0
(18)
fmn = f (xm , yn )
(19)
See GRAPHIC METHODS; LATTICE (MATHEMATICS). Kaiser S. Kunz Bibliography. K. E. Atkinson, Elementary Numerical Analysis, 2d ed., 1993; R. L. Burden and
Interstellar matter J. D. Faires, Numerical Analysis, 7th ed., 2001; J. Gregory and D. Redmond, Introduction to Numerical Analysis, 1994; J. Szabados and P. Vertesi (eds.), Interpolation of Functions, 1990.
Interstellar extinction
τλ
3
infrared
Dimming of light from the stars due to absorption and scattering by grains of dust in the interstellar medium. In the absorption process the radiation disappears and is converted into heat energy in the interstellar dust grains. In the scattering process the direction of the radiation is altered. Interstellar extinction produces a dimming of the light from stars situated beyond interstellar clouds of dust according to the equation Fλ = Fλ (0)e−τλ , where Fλ is the observed flux of radiation from the star at a wavelength λ, Fλ(0) the flux that would be observed in the absence of interstellar extinction, and τ λ the dimensionless optical depth for interstellar extinction at λ. Measures of the radiation from pairs of stars of similar intrinsic properties but with differing amounts of interstellar extinction can be used to obtain information about τ λ, which can then be used to provide clues about the nature of the interstellar dust grains. See ABSORPTION OF ELECTROMAGNETIC RADIATION; SCATTERING OF ELECTROMAGNETIC RADIATION. An interstellar extinction curve is a plot of the extinction optical depth τ λ versus the wavelength or frequency of the radiation. It is common for astronomers to express extinction in the astronomical magnitude system. In that case the extinction in magnitudes, Aλ, and the extinction optical depth are related by Aλ = 1.086τ λ. A typical interstellar extinction curve for a 500-parsec path (1630 light-years, 9.58 × 1015 mi, or 1.542 × 1016 km) through the interstellar medium of the Milky Way disk is shown in the illustration. The spectral regions spanned by the curve are indicated. The extinction curve exhibits a nearly linear increase from infrared to visual wavelengths. In the ultraviolet there is a pronounced extinction peak near 1/λ = 4.6 µm−1 or λ = 217.5 nm, followed by an extinction minimum and
visible
ultraviolet
2
1
2
4 6 1/λ, µm−1
8
An interstellar extinction curve for a typical 500-parsec path through the interstellar medium of the Milky Way disk. Extinction optical depth τ λ at wavelength λ is plotted versus 1/λ in µm−1. The curve spans three segments of the spectrum.
rapidly rising extinction to the shortest ultraviolet wavelengths for which data exist. Details not shown in the illustration include weak enhancements in extinction in the infrared near 20 and 9.7 µm and a very large number of such enhancements at visible wavelengths which are referred to as the diffuse interstellar features. A feature near 3.1 µm and other weaker infrared features appear in the extinction curves for sight lines passing through exceptionally dense interstellar clouds. The absorbing and scattering properties of solid particles depend on their size and composition. A detailed interpretation of the interstellar extinction curve of the illustration and other data relating to interstellar dust suggests that the interstellar grains of dust range in size from about 0.01 to 1 µm and are composed of silicate grains (to explain the 20and 9.7-µm features) and probably some type of carbon grain (to explain the 217.5-nm extinction peak). The 3.1-µm feature in the extinction curve implies that the interstellar dust acquires coatings of water ice in the densest regions of interstellar space. Other ices, including those of ammonia (NH3) and carbon monoxide (CO), are likely responsible for weaker infrared features observed in the densest interstellar clouds. A comparison of interstellar extinction with the absorption by interstellar atomic hydrogen reveals that the dust contains about 1% of the mass of the interstellar medium. See INTERSTELLAR MATTER. Blair D. Savage Bibliography. B. T. Draine, Interstellar dust grains, Annu. Rev. Astron. Astrophys., 41:241–289, 2003; J. S. Mathis, Interstellar Dust and Extinction, Annu. Rev. Astron. Astrophys., 28:37–70, 1990; D. C. B. Whittet, Dust in the Galactic Environment, 2d ed., 2002.
Interstellar matter The gaseous material between the stars, constituting several percent of the mass of stars in the Milky Way Galaxy. Being the reservoir from which new stars are born in the Galaxy, interstellar matter is of fundamental importance in understanding both the processes leading to the formation of stars, including the solar system, and ultimately to the origin of life in the universe. Among the many ways in which interstellar matter is detected, perhaps the most familiar are attractive photographs of bright patches of emissionline or reflection nebulosity. However, these nebulae furnish an incomplete view of the large-scale distribution of material, because they depend on the proximity of one or more bright stars for their illumination. Radio observations of hydrogen, the dominant form of interstellar matter, reveal a widespread distribution throughout the thin disk of the Galaxy, with concentrations in the spiral arms. The disk is very thin (scale height 135 parsecs for the cold material; 1-pc is equal to 3.26 light-years, 1.92 × 1013 mi, or 3.09 × 1013 km) compared to its radial extent (the distance from the Sun to the galactic center is about 8000 pc, for example). Mixed in with the gas are small solid particles, called dust grains, of
371
372
Interstellar matter characteristic radius 0.1 micrometer. Although by mass the grains constitute less than 1% of the material, they have a pronounced effect through the extinction (absorption and scattering) of starlight. Striking examples of this obscuration are the dark rifts seen in the Milky Way, and the Coalsack Nebula in the southern sky. On average, the density of matter is only 15 hydrogen atoms per cubic inch (1 hydrogen atom per cubic centimeter; in total, 2 × 10−24 g · cm−3), but because of the long path lengths over which the material is sampled, this tenuous medium is detectable. Radio and optical observations of other spiral galaxies show a similar distribution of interstellar matter in the galactic plane. A hierarchy of interstellar clouds, concentrations of gas and dust, exists within the spiral arms. Many such clouds or cloud complexes are recorded photographically. However, the most dense, which contain interstellar molecules, are often totally obscured by the dust grains and so are detectable only through their infrared and radio emission. These molecular clouds, which account for about half of the interstellar mass, contain the birthplaces of stars. See GALAXY, EXTERNAL; MILKY WAY GALAXY. Gas. Except in the vicinity of hot stars, the interstellar gas is cold, neutral, and virtually invisible. However, collisions between atoms lead to the production of the 21-cm radio emission line of atomic hydrogen. Because the Milky Way Galaxy is quite transparent at 21 cm, surveys with large radio telescopes have produced a hydrogen map of the entire Galaxy. Different emission regions in the Galaxy along the same line of sight orbit in the Galaxy with systematically different velocities relative to Earth and so are distinguishable by their different Doppler shifts. Supplemental information is obtained from 21-cm absorption-line measurements when hydrogen is located in front of a strong source of radio emission. These radio studies show that the gas is concentrated in clouds within the spiral arms, with densities typically 100 times the average of 15 atoms per cubic inch (1 atom per cubic centimeter). A typical size is 10 pc, encompassing the equivalent of 500 solar masses. These cold clouds (−316◦F or 80 K) appear to be nearly in pressure equilibrium with a more tenuous warmer (14,000◦F or 8000 K) phase of neutral hydrogen (whose mass totals that in clouds) and with even hotter ionized coronal-type gas (discussed below) in which they are embedded. See RADIO ASTRONOMY. Other species in the gas are detected by the absorption lines they produce in the spectra of stars, and so are observable only in more local regions of the Milky Way Galaxy. Interstellar lines are distinguished from stellar atmospheric lines by their extreme narrowness and different Doppler shifts. High-dispersion spectra show that the lines are composed of several components possessing unique velocities which correspond to the individual clouds detected with the 21-cm line. Elements such as calcium, sodium, and iron are detected in optical spectra, but the more abundant species in the cold gas, such as hydrogen, carbon, nitrogen, and oxygen, produce lines only in the ultraviolet and re-
quire observations from satellites outside the Earth’s atmosphere. A broad ultraviolet line of O VI (oxygen atoms from which five electrons have been removed) has also been discovered; collisional ionization to such an extent requires a very hot gas (∼3 × 105 K). Such hot gas emits soft x-rays; a Local Bubble of hot gas, extending to about 100 pc from the Sun, is a dominant contributor to the diffuse x-ray background radiation. A similar hot ionized component, called the coronal gas, occupies about half of the volume of interstellar space at the galactic midplane and extends significantly above the disk, but because of its low density (0.045 in.−3 or 0.003 cm−3) its contribution to the total mass density is small. See X-RAY ASTRONOMY. When the Galaxy formed, only the elements hydrogen and helium (and traces of lithium) were present. In the course of the evolution of a star, heavier elements are built up by nuclear burning processes in the hot interior. Obviously, the interstellar gas will be gradually enriched in these heavy elements if some of this processed material can be expelled by the star. The most significant such event identified is the supernova explosion due to core collapse, which can occur late in the lifetime of a star whose original mass exceeds about 8 solar masses. A large fraction of the stellar mass is ejected, and in the course of the violent detonation further nucleosynthesis can occur. Also important are supernova explosions of white dwarfs and perhaps nova explosions on white dwarf surfaces. Computer simulations of the burning in supernova shocks predict remarkably large relative abundances of the isotopes. Lower-mass stars might be needed to explain the abundance of some elements such as carbon and nitrogen, the relevant mass loss being by stellar winds from red giants, culminating in the planetary nebula phase. See NOVA; NUCLEOSYNTHESIS; PLANETARY NEBULA; SUPERNOVA. Through many cycles of star formation and mass loss from dying stars, the present interstellar element abundances have been built up to the levels seen in the atmospheres of relatively recently formed stars, including the Sun. The elements other than hydrogen and helium (principally carbon, nitrogen, and oxygen) constitute about 2% by mass. Direct examination of the cool interstellar gas using interstellar absorption lines reveals a considerable depletion of heavy elements relative to hydrogen. The atoms missing in the interstellar gas can be accounted for by the interstellar matter seen in solid particle form. Molecules. In interstellar space, molecules form either on the surfaces of dust particles or by gasphase reactions. In dense (greater than 1600 in.−3 or 100 cm−3) clouds where molecules are effectively shielded from the ultraviolet radiation that would dissociate them, the abundances become appreciable. These are called molecular clouds. Hydrogen is converted almost completely to its molecular form, H2. The next most abundant molecule is carbon monoxide (CO), whose high abundance may be attributed to the high cosmic abundances of carbon and oxygen and the great stability of carbon monoxide. The presence of molecules in interstellar space
Interstellar matter was first revealed by optical absorption lines. Unfortunately, most species produce no lines at optical wavelengths, and so the discovery of large numbers of molecules had to await advances in radio astronomy, for it is in this spectral band that molecular rotational transitions take place. Most lines are seen in emission. Dozens of molecules, with isotopic variants, are now known, the most widespread being carbon monoxide (CO), hydroxyl (OH), and formaldehyde (H2CO). The most abundant molecule, H2, has no transitions at radio wavelengths, but its presence in molecular clouds is inferred by the high density required for the excitation of the emission lines of other species and the lack of 21-cm emission from atomic hydrogen. In localized regions which have been heated by a shock or stellar wind to several thousand kelvins, the quadrupole infrared rotationalvibrational transitions of H2 have been detected. Molecular hydrogen has also been observed in less opaque clouds by ultraviolet electronic transitions seen in absorption (carbon monoxide is also seen this way). Molecular clouds. The existence of molecular clouds throughout the galactic disk has been traced by using the 2.6-mm emission line of carbon monoxide in a manner analogous to mapping with the 21-cm line. About half of the interstellar hydrogen is in molecular rather than atomic form, with enhancements in the inner 500 pc of the Galaxy and in a 3–7 kiloparsec ring. Molecular clouds exist primarily in the spiral arms, where they are thought to form behind the spiral shocks. The molecular clouds are quite cold (−441◦F or 10 K), unless they have an internal energy source such as a newly forming protostar. They exist in complexes with the following typical characteristics: radius, 20 pc; density, 1.5 × 104 in.−3 (1 × 103 cm−3); mass, 105 solar masses. Within the complexes are more dense regions, whose counterparts are locally identified with the optically defined dark nebulae (discussed below). Only in the largest and most dense condensations can the rarer polyatomic (largely organic) molecules be seen. One of the chief such regions is the Orion Molecular Cloud OMC1, of which the Orion Nebula H II region is a fragment ionized by newly formed massive stars. The other region is the 5–10-pc core of the Sagittarius B2 cloud near the galactic center. Altogether, 114 molecules and molecular ions have been detected, many of which could be classed as prebiotic. It is interesting that the first stages of organic evolution can occur in interstellar space. Unlike diffuse clouds, which are confined by the pressure of the surrounding medium, molecular clouds are confined by their own gravity. Molecular clouds are therefore susceptible to collapse into progressively denser phases, a sequence leading to the formation of new stars—usually in clusters of hundreds or thousands. See MOLECULAR CLOUD; ORION NEBULA; STAR FORMATION. Masers. An intriguing phenomenon seen in some molecular emission lines is maser amplification to very high intensities. The relative populations of the energy levels of a particular molecule are determined by a combination of collisional and radiative exci-
tation. The population distribution is often not in equilibrium with either the thermal gas or the radiation field; and if a higher-energy sublevel comes to have a higher population than a lower one, the electromagnetic transition between the states is amplified through the process of stimulated emission. Such nonequilibrium occurs in interstellar space in two distinct environments. See MASER. The best-understood masers are the hydroxyl (OH), water (H2O), and silicon monoxide (SiO) masers in the circumstellar envelopes of cool masslosing red supergiant stars, because the relevant physical conditions relating to density, temperature, and radiation and velocity fields can be estimated independently from other observations. Positional measurements made possible by very long baseline interferometry show that the silicon monoxide and water masing regions are within a few stellar radii of the star, whereas the hydroxyl masers arise in less dense regions more than 100 stellar radii out. Even here the identification of the detailed pump (inversion-producing) mechanism is difficult, but it seems that the 1612-MHz hydroxyl maser is pumped by far-infrared emission by warm circumstellar dust grains and that the silicon monoxide masers are pumped by collisions. The other masers are found in dense molecular clouds, in particular near compact sources of infrared and radio continuum emission identified as massive stars just being formed. Interstellar hydroxyl and water masers are widespread throughout the Galaxy, methyl alcohol (CH3OH) masing is seen in OMC1, and silicon monoxide masers are seen in OMC1 and two other star-forming regions. The strongest hydroxyl masers are identified with underlying compact H II regions, probably occurring near the boundary. Although water masers are usually close to such compact H II regions, they are not coincident with the regions of radio or infrared continuum emission. Interstellar masers appear to require collisional pumping, which is thought to be due to powerful jets emitted from stars as they accrete. Peter G. Martin; Christopher D. Matzner Chemical composition. At present, 123 interstellar molecular species are identified (see table). Mainly diatomic species are known, but recently a few more complex species have been seen in diffuse clouds 5 magnitudes); while nearly all species exist in warm dense star-forming cores in giant molecular clouds. The known molecules are largely organic. Only 21 are inorganic and stable. There are 36 stable organic species, and 67 unstable species, mostly organic, which include 33 free radicals, 14 (positive) ions, 4 simple rings, and 12 carbon chains as complex as HC11N. Most of the unstable species were unknown terrestrially before their interstellar identification. The most rapidly growing list is that of the ions and radicals, because of new laboratory synthesis techniques. Carbon chemistry is dominant, and long carbonchain species are most prominent among organic
373
374
Interstellar matter
Inorganic species (stable)∗ Diatomic H2 CO CS SiO SiS PN
HCI HF
Triatomic
4-atom
5-atom
H2 O H2 S SO2 OCS HNO N2 O SiH2 ?
NH3
SiH4
Organic molecules (stable)∗ Alcohols CH3 OH (methanol) CH3 CH2 OH (ethanol) CH2 CHOH (vinyl alcohol) HOCH2CH2OH (ethylene glycol)
Aldehydes and ketones
Acids
Hydrocarbons
H2 CO (formaldehyde) CH3 CHO (acetaldeyde) H2 CCO (ketene) (CH3 )2 CO (acetone) HCCCHO (propynal) HOCH2CHO (glycol aldehyde)
HCN (hydrocyanic) HCOOH (formic) HNCO (isocyanic)
CH2 (methylene) CH4 (methane) C2 H2 (acetylene) C2 H4 (ethylene) C6H6 (benzene)?
Amides
Esters and ethers
Organosulfur
NH2 CHO (formamide) NH2 CN (cyanamide) NH2 CH3 (methylamine)
CH3 OCHO (methyl formate) (CH3 )2 O (dimethyl ether)
H2 CS (thioformaldehyde) HNCS (isothiocyanic acid) CH3 SH (methyl mercaptan)
Paraffin derivatives
Acetylene derivatives
Rings
Other
CH3 CN (methyl cyanide) CH3 CH2 CN (ethyl cyanide)
CH3 N (cyanoacetylene) CH3 C2 H (methylacetylene) CH3 C4 H (methyldiacetylene)
C2 H4 O (ethylene oxide)
CH2 NH (methylenimine) CH 2CHCN (vinyl cyanide) NaCN (sodium cyanide)
Radicals OH NO NS SO NH2 CH SH CN
HCO CH3 C2 C2 H C2 O
Ions C2 S HCCN C3 C3 N C5 N C3 O C4 H C5 C5 H C6 H C7 H C8 H CH2 CN
Unstable molecules ∗ Rings
H3 H2 D ? CH N2 H H3 O SO CO HCO HCS HOCO HCNH CH2 D ? HC3 NH H2 COH
SiC2 c-C3 H2 c-C3 H SiC 3
Carbon chains
Isomers
-C3 H C3 S HC5 N HC7 N HC9 N HC11 N CH3 C3 N CH3 C5 N ? C4 Si H2 CCC H2 CCCC H2 CCCCCC
HNC CH3 NC HCCNC HNCCC HOC
ScCN
Molecules followed by asterisks are detected only in circumstellar envelopes (surrounding evolved, mass-losing, red giant stars), not yet in the interstellar medium.
compounds. Single, double, and triple carbon bonds are equally prevalent. No branched-chain species are known. Carbon-oxygen and carbon-nitrogen species are of comparable number, while nitrogen-oxygen bonds are rare. The molecules mostly contain carbon, oxygen, and nitrogen, reflecting the high cosmic abundances of these elements (∼10−4 that of hydrogen). At ∼10−5 of hydrogen, sulfur and silicon are seen in fewer species, although organosulfur compounds (such as C2S and C3S) are noteworthy. Phosphorus, magnesium, aluminum, and sodium are seen in only one or two interstellar species, while those containing aluminum, chlorine, fluorine, and potassium arise also in unusual thermochemical equilibrium conditions in the innermost parts of
circumstellar envelopes. Despite cosmic abundances similar to sulfur and silicon, the refractory elements magnesium and iron are underrepresented and are presumed locked in interstellar dust grains. See ELEMENTS, COSMIC ABUNDANCE OF; FREE RADICAL; ORGANOSULFUR COMPOUND. Relative to H2, into which virtually all hydrogen combines in molecular clouds, all other species occur only in trace amounts, ranging in abundance from carbon monoxide (CO) at 2 × 10−4 down to HC11N just detectable at 2 × 10−12. The simplest species are usually the most abundant, although many complex species are surprisingly abundant also. As interstellar chemistry is far from equilibrium, abundances generally bear no relation to the
Interstellar matter constituent atomic abundances or to any thermodynamic properties. Astrochemistry. Because of the very low temperatures and densities of the interstellar medium, molecules cannot form from atoms by normal terrestrial processes. Astrochemists have focused on three different chemistries. Most important is gas-phase chemistry, particularly involving molecular ions. Ionmolecule reactions satisfy the requirements of minimal activation energy and of rapid two-body rates. The (positive) ions are initiated by the cosmic-ray ionization of H2, producing H3+, which then reacts with other abundant species such as carbon monoxide (CO) and N2 to produce a large number of the observed species such as HCO+ and N2H+. Ion fragments themselves react rapidly at low temperatures to form larger ions, which eventually recombine with electrons or neutralize by reaction with easily ionized metals. Slower reactions involving neutral atoms and molecules also produce several species. All of the observed molecular ions and a sizable number of the neutral species are successfully modeled by these processes, both in diffuse clouds where photochemistry is also important, and in cold and warm dense clouds opaque to ultraviolet radiation, which harbor many more species. Most of the observed patterns of species are explained. However, several complex molecules are not readily explained, nor is the mere existence of a gas phase in the cold dense clouds, given the rapid rate at which all molecules are expected to adsorb onto dust grains at low temperatures. The chemistry of interstellar grains takes several forms. Grains can act as passive repositories of frozen molecular material while a cloud is in a cold dense phase, and the material later evaporates without chemical modification if the cloud warms as a result of nearby star formation. The same frozen material can undergo chemical reactions within the icy mantle, modifying the chemical composition before subsequent evaporation. Finally, molecules may be catalyzed from interstellar atoms arriving on dust grains. The ubiquitous H2 molecule is the best example of a species that cannot form in the gas phase at sufficient rate but that catalyzes on grains and desorbs efficiently because of its high volatility. Details of the catalysis mechanism and the nature of the grain surfaces and composition are lacking, but experiments and models suggest many species can form readily on both ice-covered grains (cold clouds) and bare silicate or carbonaceous grains (warm clouds). In cold clouds especially, desorption is problematic, and the difficulty of desorption probably accounts for the lack both of refractoryelement compounds and of several of the more complex species that also present problems for gas-phase formation. Recent searches for new species have targeted large complex stable molecules, such as are expected to form on grains. The two new alcohols and one new aldehyde suggest a chemistry that forms aldehydes and their corresponding reduced alcohols—for example, H2CO/CH3OH, long accepted as forming on grains from CO and H.
Whether the next steps—CH3CHO/CH3CH2OH or HOCH2CHO/HOCH2CH2OH—can occur on grains is unknown. No known gas-phase formation is possible. It is believed that large molecules do form on grains, but desorb only by evaporation driven by nearby star formation, and then react in the gas phase with other gaseous species to form even larger molecules. See HETEROGENEOUS CATALYSIS. Strong shocks abound in the interstellar medium, resulting from supernovae and expanding H II regions. These shocks briefly heat and compress the gas, producing required conditions for many hightemperature chemical reactions. The resulting shock chemistry has proved difficult to simulate. Models depend strongly on the type of shock assumed (dissociative or nondissociative, with or without a magnetic precursor) and do not yet provide a firm basis for comparison with observations. Certain highly abundant species such as hydroxyl (OH) and water (H2O) in their maser forms, some inorganic sulfur species, and the refractory element species (SiO, SiS, PN) are the best candidates for shock formation. Besides shocks and heating effects in star-forming regions, other processes also give rise to time dependence in interstellar chemistry. Ion-molecule chemistry in quiescent cold clouds predicts that the abundances of many species, especially the complex carbon-rich compounds (such as HC3N and HC5N), are up to 1000 times greater at ages of the order of 105 years than after the gas-phase chemistry reaches steady state (≥107 years), when essentially all carbon ends up as carbon monoxide (CO). Many clouds are therefore chemically young, implying a short lifetime, a dynamical mixing mechanism which renews the chemistry, or an overabundance of carbon. See SHOCK WAVE. Isotopic composition. All of the most abundant isotopes of the elements occurring in interstellar molecules have been observed: H, D; 12C, 13C; 16O, 18 O, 17O; 14N, 15N; 32S, 34S, 33S; 28Si, 29Si, 30Si; and 25Mg, 26 Mg. The relative abundances of the isotopically substituted molecules may differ from the nuclear isotopic ratios because of chemical fractionation, which favors the incorporation of heavier isotopes because of the reduced zero-point vibrational energy of the corresponding molecule. The effect is small for elements whose isotopic mass ratios are small, but becomes noticeable for 13C/12C, and is enormous for deuterium/hydrogen (D/H), where the concentration factor favoring deuterium may approach 104, depending strongly upon the temperature and ionization fraction. Thus D/H cannot be determined from molecular observations; a value of ∼1.5 × 10−5 is found for the local interstellar medium from ultraviolet observations of the atomic species. The other isotopic ratios can be found reliably from the molecules. Several of these ratios vary with distance from the galactic center. Because the deuterium fractionation strongly increases with decreasing temperature, the deuterated ratio XD/XH has proved an important thermometer at the very lowest temperatures. Many multiply deuterated species have been recently detected, including
375
376
Interstellar matter D2CO, D2H+, D2O, D2S, ND2H, D3+, ND3. To fully replace all the H atoms with D atoms when D/H ∼1.5 × 10−5, it appears that H2D+ is dramatically enhanced in gas depleted at least by 98% by molecules other than H2. Starting with the primordial elements hydrogen and helium, nuclear processing in stars forms the heavier elements and their isotopes, which are continuously cycled between stars and the interstellar medium. The concentrations of these elements and isotopes in the gas as a function of time depend on the rates of star formation, the stellar mass distribution, nuclear processes within the stars, and the rates of ejection of stellar material into the interstellar medium. The common isotopes 12C and 16O are formed first, and 13C, 17O, 18O, 14N, and 15N are formed from them by secondary processes of stellar nucleosynthesis. All grow in abundance with time. For fast rates of stellar processing appropriate to the galactic center, the 12C/13C ratio is expected to decrease to a steady value of ∼20 after 5 × 109 years. For slower rates of stellar processing characteristic of the galactic disk, 12C/13C should decrease more slowly, reaching ∼90 when the Earth was formed 4.5 × 109 years ago, and ∼70 at the present age of the Galaxy. An observed decrease in 18O/17O and a decrease in 15N toward the galactic center are also consistent with advanced nucleosynthesis there. Stellar processing is not the only source of new elements: several isotopes of Li, Be, and B are produced primarily via spallation of interstellar C, N, and O nuclei by cosmic rays. See ISOTOPE. Barry E. Turner Particles. The light of stars near the galactic disk is dimmed by dust grains, which both absorb the radiation and scatter it away from the line of sight. The amount of extinction at optical wavelengths varies approximately as the reciprocal of the wavelength, resulting in a reddening of the color of a star, much as molecular scattering in the Earth’s atmosphere reddens the Sun, especially near sunrise and sunset. The dependence of extinction on wavelength is much less steep than it is for molecular scattering, indicating the solid particles have radii about 0.1 micrometer (comparable to optical wavelengths). Satellite observations show a continued rise in the extinction at ultraviolet wavelengths, which seems to require a size distribution extending to smaller interstellar grains. See INTERSTELLAR EXTINCTION. By comparison of the observed color of a star with that predicted from its spectral features, the degree of reddening or selective extinction can be fairly accurately determined, but the total extinction at any given wavelength is more difficult to measure. On average, extinction by dust across 1000 pc in the galactic plane reduces a star’s visual brightness by 80%. This requires the mass density of grains to be about 1% of the gas density. Since pure hydrogen or helium grains cannot exist in the interstellar environment, a major fraction of the heavier elements must be in the solid particles. The characteristic separation of micrometer-sized grains is about 8 m (13 km). Studies of reddening in conjunction with measurements of the 21-cm and ultraviolet Lyman-α lines of hydrogen and the 2.6-mm line of carbon diox-
ide show that dust and gas concentrations are well correlated. This correlation is borne out in infrared maps of diffuse 100-µm thermal emission from dust, made in the all-sky survey by the Infrared Astronomical Satellite (IRAS). Because of the high concentration of interstellar material toward the galactic plane, it is extremely difficult to detect radiation with a wavelength less than 1 µm coming a large distance through the plane. Conversely, a line of sight to a distant object viewed out of the plane is much less obscured because the disk is so thin. The zone of avoidance, corresponding roughly to the area occupied by the Milky Way, is that region of the sky in which essentially no extragalactic object can be seen at optical wavelengths because of intervening dust. The dark rifts in the Milky Way result from the same obscuration. The component of starlight scattered rather than absorbed by the grains can be detected as a diffuse glow in the night sky near the Milky Way. However, it must be carefully separated from other contributions to the night sky brightness: the combined effect of faint stars, zodiacal light from dust scattering within the solar system, and airglow (permanent aurora). See AIRGLOW; AURORA; INTERPLANETARY MATTER; ZODIACAL LIGHT. Interstellar dust particles are heated by the optical and ultraviolet interstellar radiation field and, being relatively cool, reradiate this absorbed energy in the infrared. Such thermal radiation accounts for a third of the bolometric luminosity of the Milky Way Galaxy. Measurements by IRAS at 100 and 60 µm are in accord with emission expected from the grains whose properties are deduced from extinction. On the other hand, the unexpectedly large portion (25%) at shorter wavelengths (25 and 12 µm) has been taken to indicate a significant population of very small (∼0.001 µm) grains or large molecules. These have such small heat capacity that they can be heated transiently to a relatively high temperature by the absorption of a single photon. The light of reddened stars is partially linearly polarized, typically by 1% but reaching 10% for the most obscured stars. The broad peak in polarization at yellow light, together with the correlation of the degree of polarization with reddening, suggests that the polarization and extinction are caused by the same dust grains. The grains must be both nonspherical and spinning about a preferred direction in space to produce polarization. The direction of this polarization is ordered across the sky, often parallel to the galactic plane. This is believed to be due to the organized galactic magnetic field. The field strength is very small; measurements based on the Zeeman effect in the 21-cm line of atomic hydrogen and the 18-cm line of hydroxyl indicate a few microgauss (1 µG = 10−10 tesla), with some compression in regions of higher density. Minute amounts (0.01%) of circular polarization have also been used to study the topology of the magnetic field. The possible types of grain material can be restricted through considerations of relative cosmic abundances, but detailed identification is difficult because of the paucity of spectral features in the
Interstellar matter extinction curve. Silicates are suggested by an excess of absorption at 10 µm in front of strong infrared sources, and in the ultraviolet an absorption peak at 220 nanometers could be explained by a component of small graphite particles. Spatially extended red emission in excess of that expected from scattering in reflection nebulae has been interpreted as evidence for fluorescence of hydrogenated amorphous carbon solids. A popular theory of grain formation begins with the production of small silicate and carbon particles in the extended atmospheres of red supergiant stars. While in the circumstellar region, these grains are warmed by the starlight, so that they are detectable by their thermal emission on the near infrared (10 µm). Radiation pressure ejects these particles into the interstellar gas, where they become much colder (−433◦F or 15 K). A dielectric mantle is then built up by accretion of the most abundant elements, hydrogen, carbon, nitrogen, and oxygen. Ices of water and carbon monoxide are detected by 3.1- and 4.67-µm absorption bands, respectively, but only deep within molecular clouds; these are probably in the form of volatile coatings accreted on more refractory grains. Dark nebulae. A cloud of interstellar gas and dust can be photographed in silhouette if it appears against a rich star field; however foreground stars make distant clouds indiscernible. Many large dark nebulae, or groups of nebulae, can be seen in the Milky Way where the material is concentrated; these are coincident with molecular clouds. The distance to a dark nebula can be estimated by using the assumption that statistically all stars are of the same intrinsic brightness. When counts of stars within a small brightness range are made in the nebula and an adjacent clear region, the dimming effect of the cloud will appear as a sudden relative decrease in the density of stars fainter than a certain apparent brightness, which corresponds statistically to a certain distance. Alternatively, a lower limit to the distance is provided by the distance to the most distant unreddened stars in the same direction. One of the best-known and nearest dark nebulae is the Coal Sack, situated at a distance of 175 pc and visible in the Southern Hemisphere. Another example is the “Gulf of Mexico” area in the North America Nebula. Obscuring clouds of all sizes can be seen against the bright H II regions described below. In many cases the H II regions and dark nebulae are part of the same cloud. The bay in the Orion Nebula is one such region, as well as the spectacular Horsehead Nebula. Even smaller condensations, called globules, are seen in the Rosette Nebula (Fig. 1) and NGC 6611 (M16; Fig. 2). The globules, which are almost completely opaque, have masses and sizes which suggest they might be the last fragments accompanying the birth of stars. See GLOBULE; NEBULA. Bright nebulae. An interstellar cloud can also become visible as a bright nebula if illuminated by a nearby bright star. Whether an H II region or reflection nebula results depends on the quantity of ionizing radiation available from the star. To be distinguished from H II regions, but often also called bright gaseous nebulae, are shells of gas that have
Fig. 1. Quadrant of the shell-shaped Rosette Nebula, showing dense globules of obscuring dust and gas silhouetted on the bright emission-line background of an H II region. The central hole is swept clear of gas by winds, supernovae, and ionization from newly formed massive stars; radiation pressure from the central star (lower left) acting on dust grains has also been suggested. The quadrant was photographed in red light with the 48-in. (122-cm) Schmidt telescope of the Palomar Observatory. (California Institute of Technology/Palomar Observatory)
been ejected from stars. Included in this latter category are planetary nebulae and nova shells which have a bright emission-line spectrum similar to that of an H II region, and supernova remnants. See CRAB NEBULA. H II regions. A star whose temperature exceeds about 45,000◦F (25,000 K) emits sufficient ultraviolet radiation to completely ionize a large volume of the surrounding hydrogen. The ionized regions (Figs. 1 and 2), called H II regions, have a characteristic red hue resulting from fluorescence in which hydrogen, ionized by the ultraviolet radiation, recombines and emits the H ∝ line at 656.3 nm. Optical emission lines from many other elements have been detected, including the “nebulium” line of oxygen at 500.7 nm. H II regions are also conspicuous sources of free-free-radio emission characteristic of close
Fig. 2. NGC 6611 (M16), a complex H II region in which the exciting stars are members of a cluster. Note the dark globules and elephant-trunk structures, and the bright rims where ionizing radiation is advancing into more dense neutral gas. The nebula was photographed in Hα + [N II] with the 200-in. (508-cm) telescope of the Palomar Observatory. (California Institute of Technology/Palomar Observatory)
377
378
Interstellar matter electron-proton encounters in the 14,000◦F (8000 K) gas. Some H II regions are seen only as radio sources because their optical emission is obscured by dust grains. Radio recombination lines of hydrogen, helium, and carbon, which result when the respective ions recombine to highly excited atoms, are also important. An H II region can be extended with a relatively low surface brightness if the local density is low, as in the North America Nebula. However, the bestknown regions, such as the Orion Nebula, are in clouds that are quite dense (1.5 × 104 to 1.5 × 105 H atoms per cubic inch, or 1 × 103 to 1 × 104 H atoms per cubic inch) compared to the average; dense clouds absorb ionizing radiation in a smaller region, and hence produce a brighter fluorescence. Since the brightest stars are also the youngest, it is not surprising to find them still embedded in the dense regions from which they formed. Ionization of molecular gas increases its temperature from 30 K to 8000 K, and increases its pressure correspondingly. Enhanced pressures cause the H II region to explosively expand away from the forming star, often leading it to blowout through the side of the molecular cloud (the “blister phase,” Fig. 2). The cloud mass unbound in this process is typically ten times that of the stellar association causing it, so H II regions are responsible for much of the inefficiency of star formation. Reflection nebulae. In the absence of sufficient ionizing flux, the cloud may still be seen by the light reflected from the dust particles in the cloud. The scattering is more efficient at short wavelengths, so that if the illuminating star is white the nebula appears blue. The absorption or emission lines of the illuminating star appear in the nebular spectrum as well. Reflection nebulae are strongly polarized, by as much as 40%. Both the color and the polarization can be explained by dust grains similar to those which cause interstellar reddening. Extended near-infrared (2-µm) emission in some bright reflection nebulae provides evidence for thermal emission from transiently heated small grains. There are also numerous near-infrared spectral emission features consistent with C-H and C-C bending and stretching vibrations, and suggestive of polycyclic aromatic hydrocarbons (PAHs). Whether such compounds exist as free-flying molecules (or small grains), as coatings on larger dust grains, or both, is undecided. Some reflection nebulae, such as those in the Pleiades, result from a chance encounter between a cloud and an unrelated bright star, providing a unique look at interstellar cloud structure. Other reflection nebulae appear to be intimately related to stars in early or late stages of stellar evolution. The Orion Nebula has an underlying reflection nebula arising from the dust in the gas cloud which produced the H II region. However, in this and other H II regions the emission-line radiation rather than the reflected light dominates the nebulosity. Supernova remnants. Long after an exploding star has faded away, its supernova remnant (SNR) can still be seen. These are composed of gas expelled from the star itself and interstellar gas impelled into mo-
tion by it. Supernova remnants provide material to enrich the interstellar gas in heavy elements. The gas shell initially expands at up to 6000 mi · s−1 (10,000 km · s−1); during this phase the supernova remnant is a strong emitter of radio waves through the synchrotron emission process in which internally accelerated relativistic electrons (cosmic rays) spiral in a magnetic field which has been amplified by turbulent and compressive motions. Later the supernova remnant is decelerated as it plows up more and more of the surrounding interstellar gas. Compression of the ambient magnetic field and cosmicray electron components of this gas leads again to a radio synchrotron source, with a characteristic shell structure. The supernova remnant generates a shock where it impinges on the interstellar medium, and because of the high relative velocity, temperatures of 105–106 K are reached in the post shock gas. This hot gas has been detected by its x-radiation in both emission lines and the free-free continuum. As the gas cools behind the shock, it produces optical emission lines too, giving rise to the bright shells seen in wide-field optical photographs (for example, the Veil Nebula which is part of the Cygnus Loop supernova remnant). Supernovae caused by the collapse of a massive star’s core produce a compact remnant, either a black hole or a neutron star (depending on the mass of the core before collapse). In the case of a magnetized, spinning neutron star, a pulsar will be observed within the supernova remnant. Pulsars emit a wind of relativistic particles and tightly wound magnetic fields, which illuminate the pulsar wind nebulae found within some supernova remnants via synchrotron emission (such as the Crab Nebula around the Crab Pulsar). Notably, no pulsar has been detected to date from the supernova 1987A in the Large Magellanic Cloud. Supernova remnants provide an important source of energy for heating and driving turbulent motions in the cooler interstellar gas, and perhaps for accelerating cosmic rays. In low-density regions their influence can propagate through large regions of space, and they are probably responsible for maintaining the high-temperature coronal gas seen throughout the galactic disk. See COSMIC RAYS. Magnetic fields and cosmic rays. The interstellar magnetic field, initially inferred from polarization of starlight, has been measured by means of the Zeeman splitting of the circularly polarized 21-cm emission, by the Faraday rotation of linearly polarized radio signals from pulsars and external galaxies, and from the synchrotron emission due to cosmic-ray particles spiraling about the field lines. The mean interstellar field strength is roughly 5 microgauss, or 10−5 of Earth’s surface field. Magnetic field lines are efficiently coupled to charged particles, causing fields and gas to behave as a single fluid with properties of both components—notably, shearing of field lines engenders an opposing tension force. Magnetic forces contribute significantly to the support of gas against gravity, both above and below the midplane of the galaxy, and within individual molecular clouds where field strengths are 30 µG.
Interstellar matter Cosmic rays are energetic charged particles— electrons, protons, and the nuclei of heavy elements—that pervade the interstellar medium. Cosmic rays observed at Earth include a low-energy variety produced by the Sun (which cause aurorae), interstellar particles of a wide range of energies (which cause genetic mutations and seed cloud formation) produced most likely by supernova remnants, and an extremely high energy population of mysterious extragalactic origin. Like interstellar gas, cosmic rays are effectively coupled to the magnetic field and affect the motion of the combined fluid. Indeed, cosmic-ray pressure is comparable to magnetic forces and turbulent motions in supporting gas away from the midplane. Cosmic rays contribute to the ionization and heating of the gas, but outside of dark clouds they are not as effective as the photoelectric absorption of far ultraviolet photons by dust grains. Their collisions with atomic nuclei produce light elements via spallation, and also produce π 0 mesons which decay to observable gamma-ray photons. See COSMIC RAYS. Star formation. Superluminous stars such as those exciting H II regions cannot be very old because of the tremendous rate at which they are exhausting their supply of hydrogen for nuclear burning; the most luminous are under 100,000 years old. With this clear evidence for recent star formation, observations have been directed toward discovering stars even closer to their time of formation. Compact H II regions, such as the Orion Nebula, appear to be the first fragments of much larger molecular clouds to have formed stars. Ultracompact H II regions, seen in molecular clouds at radio wavelengths but totally obscured at optical wavelengths, appear to be an earlier stage in which the protostellar core has just become highly luminous. These are often called cocoon stars. Even earlier still are the compact infrared sources in which hot dust grains in a protostellar cloud are detected at wavelengths of 5– 100 µm. These earliest phases are often associated with intense water and hydroxyl molecular maser emission. Examples of all of these stages are often found in the same region of space. In addition to the Orion molecular cloud, many regions such as the W3 radio and infrared sources associated with the visible H II region IC 1795 have been studied extensively. Both the broad spectral energy distribution of thermal infrared emission and the spatial distribution of molecular emission from dense gas indicate that the final gravitational collapse of an interstellar cloud fragment commonly proceeds through an accretion disk. A flattened geometry is indeed expected from conservation of angular momentum during the collapse. In the protostar stage, radiant energy is derived from gravitational potential energy released as material accretes onto the star through the disk. A commonly associated phenomenon is the ejection of substantial amounts of gas in high-velocity bipolar outflows along in the direction perpendicular to the disk. These outflows are observed as optical emission lines from collimated jets and from shocked knots called Herbig-Haro objects, and as radio emission
from carbon monoxide in extended lobes. Like H II regions, violent protostellar outflows are responsible for casting molecular gas away from forming stars and limiting the efficiency with which molecular clouds are converted into stars. See HERBIG-HARO OBJECTS; PROTOSTAR. Although the details are far from understood, there is evidence that star formation is a bimodal process, with more massive stars forming only in some giant molecular cloud complexes which aggregate in the spiral arms of the Milky Way Galaxy, and less massive stars occurring independently over the whole range of molecular cloud sizes. Low-mass stars are less luminous and cooler and so do not produce the diagnostic compact H II regions. In the nearby Taurus molecular cloud, numerous dense (greater than 1.5 × 105 in.−3 or 104 cm−3) stellar mass cores, detected by ammonia (NH3) emission, appear to be the immediate precursors of low-mass stars. This association is corroborated by their coincidence with other signposts of star formation: IRAS infrared sources, bipolar outflows, and T Tauri stars, which are lowmass pre-main-sequence stars. Molecular clouds are supported against wholesale gravitational collapse by turbulent motions, probably associated with waves in the interstellar magnetic field, and are eventually dispersed by mechanical energy which results from the star formation process itself (bipolar outflows, expanding H II regions, stellar winds). On average, the star formation rate corresponds to a 2% conversion efficiency of molecular cloud material, but occasionally a higher efficiency, in excess of 30%, can lead to a gravitationally bound cluster of stars such as the Pleiades or NGC 6811 (M16; Fig. 2). Overall, the Milky Way Galaxy processes about five solar masses of interstellar matter per year into new stars, somewhat larger than the rate at which dying stars replenish the gas, so that the present interstellar medium is decaying on a time scale of about 1 × 109 years. A supply of fresh gas from outside the galactic disk is suggested by the fact that this is one-tenth of the Galaxy’s age. See INFRARED ASTRONOMY; STELLAR EVOLUTION. Peter. G. Martin; Christopher D. Matzner Bibliography. W. B. Burton, B. G. Elmegreen, and R. Genzel, The Galactic Interstellar Medium, 1992; L. D’Hendecourt, C. Joblin, and J. Anthony (eds.), Solid Interstellar Matter: The ISO Revolution, Springer-Verlag, 1999; E. Levy, J. I. Lunine, and M. S. Matthews (eds.), Protostars and Planets III, 1993; V. Mannings, A. Boss, and S. Russell (eds.), Protostars and Planets IV, Unversity of Arizona Press, 2000; D. E. Osterbrock and G. J. Ferland, Astrophysics of Gaseous Nebulae and Active Galactic Nuclei, 2d ed., 2005; J. Palous, W. B. Burton, and P. O. Lindblad (eds.), Evolution of Interstellar Matter and Dynamics of Galaxies, 1992; B. E. Turner, Recent progress in astrochemistry, Space Sci. Rev., 51:235–337, 1989; E. F. van Dishoeck and G. A. Blake, Chemical evolution of star-forming regions, Annu. Rev. Astron. Astrophys., 36:317–368, 1998; T. L. Wilson and R. T. Rood, Abundances in the interstellar medium, Annu. Rev. Astron. Astrophys., 32:191–226, 1994.
379
380
Intestine
Intestine The tubular portion of the digestive tract, usually between the stomach and the cloaca or anus. The detailed functions vary with the region, but are primarily digestion and absorption of food. The structure of the intestine varies greatly in different vertebrates (see illus.), but there are several common modifications, mainly associated with increasing the internal surface area. One, seen in many fishes, is the development of a spiral valve; this turns the intestine into a structure resembling a spiral staircase. Another, seen in some fish and most tetrapods, is simply elongating and then coiling the intestine. This can reach extremes in large herbivores: Oxen have intestinal lengths of over 150 ft (45 m). In numerous forms there are blind pouches, or ceca, off part of the intestine. In fish these are commonly at the anterior end; in tetrapods they generally lie at the junction between the large and small intestines. In all vertebrates the inner surface of the intestine is irregular, with ridges and projections of various sorts; these reach their maximum development in the extremely fine and numerous finger-shaped villi found in mammals. In humans the intestine consists of the small and large intestines. The small intestine is further divided into three major parts: the duodenum, the jejunum, and the ileum. The duodenum, 10–12 in. (25–30 cm) long, begins at the pyloric sphincter of the stomach and curves around the head of the pancreas on the right side of the anterior part of the abdomen. It receives the ducts of the biliary system and the pancreas. The jejunum and ileum are about 19 ft (6 m) long and form a much-coiled tube that empties at right angles into the large intestine through the ileocolic valve. The large intestine, or colon, consists of five parts: the ascending, transverse, descending, and sigmoid regions, and the terminal rectum which empties into the anal canal. The microscopic structure of the intestine comprises an inner glandular mucosa, a muscular coat, and an outer serosa of connective tissues which is covered in most areas by peritoneum. The intestine is supported by dorsal mesenteries of varying extent, which contain an extensive system of arteries, veins, lymphatics, and nerves to the various regions. See DIGESTIVE SYSTEM. Thomas S. Parsons
Intra-Americas Sea That area of the tropical and subtropical western North Atlantic Ocean encompassing the Gulf of Mexico, the Caribbean Sea, the Bahamas and Florida, the northeast coast of South America, and the juxtaposed coastal regions, including the Antillean Islands. Meteorology. Meteorologically, the Intra-Americas Sea (IAS) is a transition zone between truly tropical conditions in the south and a subtropical climate in the north. The annual variability of the intertropical convergence zone (ITCZ) controls the easterly direction (that is, from the east) of the trade
winds, where low surface pressures and rising air create the characteristic cumulus and cumulonimbus clouds. Except for winter when frontal passages bring westerly winds to the northern Intra-Americas Sea, the intertropical convergence zone dominates the weather. The Intra-Americas Sea is also the region that either spawns or interacts with the intense tropical storms known locally as the West Indian Hurricane. Typically there are 9 tropical storms per year, six of which become intense enough (74-mi/h or 119-km/h surface winds) to be classified as hurricanes. These storms are destructive because of their winds and storm surge, but they provide significant amounts of the annual rainfall to many of the Intra-Americas Sea islands and coastal zones. Air flowing over the Intra-Americas Sea acquires moisture that is the source of much of the precipitation over the central plains of North America. Its quality has caused some meteorologists to give it the name American Monsoon. See HURRICANE; MONSOON METEOROLOGY; TROPICAL METEOROLOGY. Circulation. Ocean currents of the Intra-Americas Sea are dominated by the Gulf Stream System. Surface waters flow into the Intra-Americas Sea through the passages of the Lesser Antilles, and to a lesser extent through the Windward Passage between Cuba and Haiti, and the Anegada Passage between Puerto Rico and Anguilla. These inflowing waters form the Caribbean Current, which flows westward and northward into the Gulf of Mexico through the Yucat´an Channel, where it is called the Yucat´an Current. Here it is clearly recognized as a western boundary current because of its high surface speed, depth, volume transport, and very clear waters. As the water passes through the Gulf of Mexico, it is known as the Gulf Loop Current because it flows far to the north, loops anticyclonically (clockwise) and flows south along the continental slope off Florida, and finally loops again cyclonically as it turns to exit into the Straits of Florida. In the Straits of Florida the current is called the Florida Current, a name it retains until about the latitude of Cape Canaveral, after which is it usually called the Gulf Stream. The Gulf Loop Current forms large (186-mi or 300-km diameter) anticyclonic eddies at an average rate of every 10–11 months; these eddies drift into the western Gulf of Mexico, spinning down and releasing their momentum, heat, and salt into the basin. Unlike the European Mediterranean whose deep waters are formed locally, the subsurface waters of the Intra-Americas Sea can be traced upstream as far south as Antarctica (Antarctic Intermediate Water) and as far east as the Canary Islands (Subtropical Underwater). Deep water flowing southward along the eastern boundary of the Intra-Americas Sea is known as the Deep Western Boundary Current, which has its source in the Arctic. Thus the Intra-Americas Sea’s oceanic circulation is truly part of a global pattern. See GULF OF MEXICO; GULF STREAM; OCEAN CIRCULATION. Surrounding land areas. Land surrounding the Intra-Americas Sea is often fringed by coral reefs, and there are five tectonic plates that influence its geophysics: the North American, South American,
Intron
esophagus spiracle
rectal gland
stomach
esophagus gill slits
intestine
381
stomach intestine
mouth rectum
gill slits mouth
spiral valve
(a)
pyloric ceca
cloaca
rectum
(b) esophagus
liver
esophagus pancreas covering bile duct stomach
gallbladder
pylorus small intestine
gallbladder bile duct
liver
pancreas
stomach
pancreatic duct cecum
bladder large intestine cloaca (c)
transverse colon
duodenum
small intestine
ascending colon (d)
descending colon rectum
Vertebrate digestive tracts, showing structure of fish, amphibian, and mammalian intestines. (a) Shark. (b) Perch. (c) Frog. (d) Guinea pig. (After A. S. Romer, The Vertebrate Body, 3d ed., Saunders, 1962)
Caribbean, Cocos, and Nazca plates. Earthquakes are not uncommon, nor are the associated seismic sea waves (tsunamis) that often cause extensive coastal destruction but that are usually considered to be only Pacific Ocean phenomena. The coastal lands are often the site of extensive mangrove forests; the coastal waters are highly productive in shrimp, demersal fishes, mollusks, and lobster; and the coastal and littoral zones are rich in mineral resources, particularly petroleum. River discharge from several South American rivers, notably the Orinoco and Amazon, drifts through the Intra-Americas Sea and carries materials thousands of kilometers from the deltas. The large deltas are heavily impacted by anthropogenic activities, but they remain the source of rich fisheries and plankton communities. Because of the small tidal range in the Intra-Americas Sea, most deltas are wind-dominated geological features. See DELTA; EARTHQUAKE; MARINE FISHERIES; PLATE TECTONICS; REEF; TIDE; TSUNAMI. George A. Maul Bibliography. G. A. Maul (ed.), Climatic Change in the Intra-Americas Sea, United Nations Environment Programme, 1993; C. N. K. Mooers and G. A. Maul, Intra-Americas Sea Circulation, The Sea, vol. 2, New York, pp. 183-208, 1998.
Intron A segment of deoxyribonucleic acid (DNA) transcribed into ribonucleic acid (RNA) as part of a longer strand of RNA, called heterogeneous nuclear RNA (hnRNA), but that does not survive the processes of messenger RNA (mRNA) maturation.
Hence, genes are split into portions, termed exons, that appear in mRNAs or in structural RNAs, and introns. During the maturation of pre-mRNA to mature mRNA, a series of control processes occur that specify and narrow down the set of mRNAs that become functionally active in a given cell. These processes multiply the species of transcripts through the differential combination of exons and the inclusion of parts of introns into the functional mRNAs; thus, several mRNA products can be generated from a single gene in a controlled manner. For example, certain exons may be included or spliced out during mRNA maturation (alternative splicing of the same hnRNA molecule), or two different hnRNA molecules may form during the splicing reaction of one mRNA transcript, a process termed transsplicing. See DEOXYRIBONUCLEIC ACID (DNA); EXON; RIBONUCLEIC ACID (RNA). Split genes. Split genes were discovered in 1977 during analyses of mammalian-virus DNAs and their mRNA products. Shortly thereafter, it was shown that an intrinsic mammalian gene, for beta hemoglobin, also has protein-coding regions separated by lengths of DNA that are not translated into protein. Such discoveries were made through techniques of recombinant DNA research, and since 1978 it has become clear that most genes in eukaryotes, and a few in prokaryotes, are split. These include not just a large number of different protein-coding genes but also genes encoding transfer RNAs (tRNAs) in such diverse eukaryotes as yeast and frogs, and genes encoding structural RNAs of ribosomes in some protozoa. Introns are not limited to genes in cell nuclei but are also found in mitochondrial genes of lower
anus
382
Intron eukaryotes and in some chloroplast genes. Characteristics. The number of introns in a gene varies greatly, from one in the case of structural RNA genes to more than 50 in the collagen gene. The lengths, locations, and compositions of introns also vary greatly among genes. However, in general these sizes and locations, but not the DNA sequence, are comparable in homologous genes in different organisms. The implication is that introns became established in genes early in the evolution of eukaryotes; further, while their nucleotide sequence appears to be unimportant, their existence, positions, and sizes are significant. As an example, the genes of kinases involved in important signal transduction pathways in eukaryotes—for example, the mitogen-activated protein kinases (JNK and p38)—are interrupted by introns localized primarily in the conserved kinase subdomains. An analysis of these genes in sponges (Porifera) revealed that the positions of the introns are highly conserved from sponges to humans; moreover, in Caenorhabditis elegans and Drosophila melanogaster they are found, if present, at the same positions. phase I modularization
phase II intron insertion
phase III exon duplication
d-P Domains:
X
Ig
Ex-1
In
Y
Ex-2
In
Ex-3
Ig
Ex-1
In
Ex-2,1 In Ex-2,2 In
Ex-3
Ex-2,1
phase IV free module
Ig
p-RTK
Ex-2,1
phase V module insertion [exon shuffling]
Ex-2,1*
phase VI module duplication
Ex-2,1**
Ig 1 Intron I
Ig 2 Intron II
RTK
mosaic protein Domains: P/S/T
Ig 1
Ig 2TM
extracellular
JM
TK
intracellular
Proposed generation of mosaic proteins for sponge RTK. The gene coding for a donor protein d-P which consists of an lg-like domain [phase I] undergoes modularization [phase II] by insertion of introns [In]. One of the created exons, Ex-2, is duplicated (forming Ex-2,1 and Ex-2,2) [phase III], and the product Ex-2,1 is liberated [phase IV] and inserted as a module into the precursor gene of RTK [p-RTK]; the process of module insertion by exon shuffling is completed [phase V]. After duplication of the two modules, the sponge RTK gene, composed of two Ig-like domains [Ex-2,1∗ and Ex-2.1∗∗], is completed [phase VI] and the mosaic protein is formed.
RNA splicing. Three types or mechanisms of RNA splicing have been identified. One involves tRNA, wherein removal of a single intron from tRNA precursors is accomplished by a splicing enzyme complex that recognizes a particular RNA secondary structure, or shape. Another was discovered in studies of protozoan ribosomal RNA (rRNA), and also has been shown to be a part of the maturation of both rRNA and some mRNAs in yeast mitochondria. Excision of the intron from the precursor of Tetrahymena rRNA, for example, is an autocatalytic process in which the precursor RNA folds into a structure in which the intron is a loop, then cleaves itself at the ends of the intron and ligates the two parts of the mature rRNA. The reactions proceed without the involvement of protein. Processing of RNA in mitochondria takes one of two forms. The first is the autocatalytic mechanism mentioned above, and the second involves splicing enzymes, called RNA maturases, that are actually coded for within introns of genes for other mitochondrial proteins. In an interesting form of regulation of gene expression, a maturase mRNA is translated into an enzyme that destroys its own mRNA while splicing together another mRNA. See MITOCHONDRIA. The third splicing mechanism has been characterized for eukaryotic mRNAs of nuclear origin. Although little DNA sequence homology exists among introns in genes that are expressed as mRNA, there is a consistent relatedness among them at their ends (their junctions with flanking protein-coding sequences) that is involved in the removal of intron sequences from gene transcripts during RNA processing. A class of particles known as small nuclear ribonucleoproteins (SnRNPs) have been identified as additional participants in the processing of mRNAs from larger precursors. They interact with exon– intron junctions to loop out introns and facilitate RNA cleavage and exon ligation. In bacteria, certain RNAs in bacteriophage-infected cells result from the same sort of autocatalytic splicing described for protozoan rRNA. Role. Speculation on the roles and the evolution of introns is mostly based on correlations that have been seen between domains of protein structure and the exons of genes that are defined by intervening introns. Domains in proteins are regions of a molecule that can be distinguished from others on structural and functional grounds. For example, the enzyme tyrosine kinase (TK) has several domains: among them is one that binds the substrate ATP and another that transfers the phosphate group to the protein. The presence of introns in eukaryotic structural genes and the absence in prokaryotes were explained by two different hypotheses. First, the “intron-late” theory suggests that introns arose from transposable elements which were inserted later into genes of eukaryotes. Second, the “intron-early” theory assumes that introns are the relics of the RNA world, implying that the genes-in-pieces organization of the eukaryotic genome reflects a primitive original form. The general conclusion of the intron-early theory was that the exons correspond to the building
Invasion ecology blocks from which the genes were composed by intronic recombination. Mosaic/modular proteins and exon shuffling. Mosaic proteins or modular proteins are formed by fusion of two or more gene segments that encode different protein domains. They are assumed to be young proteins and unique to eukaryotes, especially to multicellular animals. Seeking to explain the evolution of mosaic proteins, the modularization hypothesis was formulated. It was assumed that modules suitable for exon shuffling are created by insertion of introns into the genome adjacent to exons. Members of mosaic proteins contain extracellular domains, such as receptor proteins, including the receptor protein kinases. Mosaic proteins have been identified and described from metazoan phyla including sponges. In contrast to the observations with ancient protein genes, there is strong evidence that in higher eukaryotes, in Metazoa, exon shuffling has been used to create the modular assembly of many multidomain proteins. To test the hypothesis that the process of exon shuffling contributed to the explosive nature of metazoan radiation, studies on the gene organization of sponges, which are extant metazoans from the period prior to the Cambrian Explosion, have been performed. In one study, it was established that the sponge genes of the receptor tyrosine kinases (RTKs), enzymes found exclusively in Metazoa, contain only two introns located in the region encoding the extracellular part of the RTKs. In contrast, all metazoan genes for RTKs found in phyla younger than the Porifera contain several introns in the TK domain. This surprising fact provided experimental support for the view that ancient genes were not “in pieces.” It has been suggested that introns became established in the genes of eukaryotes (and to a limited extent in bacteria) because they facilitate a genetic shuffling or rearrangement of portions of genes which encode various functional units, thus creating new genes with new combinations of properties. The RTK gene of the sponge Geodia cydonium codes for a mosaic protein which is composed of several domains/modules: the Pro-Ser-Thr-rich domain, the two Ig-like domains, and the TK domain. The Ig-like domains are modules according to strict nomenclature. A schematic representation of the formation of the Ig-like module and its subsequent insertion into the RTK precursor gene is shown in the illustration. See PORIFERA. Recombination. Genetic recombination within introns, that is, between coding units rather than within them, provides a means of genetic evolution via wholesale reassortments of functional subunits or building blocks, rather than by fortuitous recombinations of actual protein-coding DNA sequences. See GENE; GENETIC CODE; RECOMBINATION (GENETICS). Werner E. G. Muller; ¨ Peter M. M. Rae Bibliography. W. F. Doolittle, Genes in pieces: Were they ever together?, Nature, 272:581–582, 1978; V. Gamulin et al., Experimental indication in favor of the introns-late theory: The receptor tyrosine kinase
gene from the sponge Geodia cydonium, J. Mol. Evol., 44:242–252, 1997; W. Gilbert and M. Glynias, On the ancient nature of the introns, Gene, 135:137– 144, 1993; B. Lewin, Genes VII, Oxford University Press, 2000; W. E. G. M¨ uller (ed.), Molecular Evolution: Towards the Origin of Metazoa (Progress in Molecular and Subcellular Biology 21), SpringerVerlag, Berlin, 1998; W. E. G. M¨ uller (ed.), Molecular Evolution: Evidence for Monophyly of Metazoa (Progress in Molecular and Subcellular Biology 19), Springer-Verlag, Berlin, 1998; W. E. G. M¨ uller et al., Conservation of the positions of metazoan introns from sponges to humans: Evolutionary implications, Gene, 295:299–309, 2002; L. Patty, Protein Evolution by Exon-Shuffling, Springer-Verlag, New York, 1995.
Invasion ecology The study of the establishment, spread, and ecological impact of species translocated from one region or continent to another by humans. Biological invasions have gained attention as a tool for basic research, used to study the ecology and evolution of populations and of novel biotic interactions; and as a conservation issue tied to the preservation of biodiversity. The invasion of nonindigenous (also called exotic, alien, or nonnative) species is a serious concern for those charged with managing and protecting natural as well as managed ecosystems. For example, nonindigenous species negativley affect over one-third of the U.S. Endangered Species List. See ECOLOGY; POPULATION ECOLOGY. Species. Ecologists make a distinction between introduced species, meaning any species growing outside its natural habitat including cultivated or domesticated organisms, and invasive species, meaning the subset of introduced species that establish free-living populations in the wild. The great majority of introduced species (approximately 90% as estimated from some studies) do not become invasive. While certain problem invaders, such as the zebra mussel (Dreissena polymorpha), exact enormous economic and ecological costs, other introduced species are generally accepted as beneficial additions, such as most major food crops. Routes of entry. Species arrive in a new region by a variety of means, including both intentional and accidental introductions. Intentional plant introductions have been promoted primarily by the horticulture industry to satisfy the public’s desire for novel landscaping. However, plants have also been introduced for agriculture, for silviculture, and for control of soil erosion. Intentional animal introductions include game species brought in for sport hunting or fishing [such as ringneck pheasant (Phasianus colchicus) in North America, water buffalo (Bubalus bubalis) in Australia, and brown trout (Salmo trutta) in New Zealand]. Unlike these examples, intentional introductions can also include species that are not necessarily intended to form self-sustaining populations, such as those promoted by the aquarium or
383
384
Invasion ecology pet trade. Pets may be released into the wild when their disenchanted owners look for an easy method of disposal. Species introduced accidentally are “hitchhikers.” Shipping ballast has been a major vector, first in the form of soil carrying terrestrial invertebrates and plant seeds or rhizomes, and more recently in the form of ballast water carrying planktonic larvae from foreign ports. While many species are introduced in ballast or by similar means (such as packing material before the use of styrofoam), hitchhikers can also be explicitly unwanted parasites that bypass importation and quarantine precautions. For example, many nonindigenous agricultural weeds have been imported in contaminated seed lots. With the burgeoning of international travel and the rapid development of transportation technology, there are many new routes that species may take. In the past, global patterns of established nonindigenous species reflected the dominance of particular trade routes, for example directional movement from Europe to North America. However, trade and travel now occur between almost any two points on the planet, opening up entirely new patterns of species interchange. Predicting invaders. Given that most introduced species will not become invaders, an important question concerns the extent to which predictions can be made as to which species will invade. The fact that the same species appear over and over invading different regions [such as the black rat (Rattus rattus) and water hyacinth (Eichornia crassipes)] is evidence that invasions are not simply random events; rather, there is such a thing as a “good invader.” Extrapolating from groups of successful invaders, biologists have compiled lists of traits that seem to confer invasiveness such as rapid growth, high reproductive rates, and tolerance to a broad range of conditions. The flaw with this approach is that saying most invaders grow rapidly is not equivalent to saying most species that grow rapidly will be invasive. To assert the latter, one has to compare the probability of invasion success for species that grow rapidly compared to species that grow slowly. This is a much more difficult task because it requires knowledge of how many species were introduced but were not successful, information that is particularly elusive for accidental introductions. However, being able to predict which species will invade has critical implications for policy regulating the importation of nonindigenous species. Sites of invasions. Certain types of habitats seem to have higher numbers of established nonindigenous species than others. The characteristics that make a site open to invasion must be determined. For example, islands are notably vulnerable to invasions. Islands usually have fewer resident species to begin with, leading to the conjecture that simpler systems have less biotic resistance to invaders. That is, an introduced species is less likely to be met by a resident competitor, predator, or pathogen capable of excluding it. The idea of biotic resis-
tance is also consistent with the idea that complexity confers stability in natural systems. See INVASION ECOLOGY. A second generalization about invasibility is that ecosystems with high levels of anthropogenic disturbance, such as agricultural fields or roadsides, also seem to be more invaded. Increased turnover of open space in these sites could provide more opportunities for the establishment of new species. An alternative explanation is that many species that adapted to anthropogenic habitats in Europe simply tagged along as humans re-created those habitats in new places. Those species would naturally have an advantage over native species at exploiting human disturbances. A final suggestion by proponents of ecosystem management is that disturbance (including, in this context, a disruption of natural disturbance regimes, for example, fire suppression) weakens the inherent resistance of ecosystems and promotes invasion. Spread. Early ecological models represented the spread of an invading species across a landscape with a summation of two terms: one describing growth in numbers, and the other describing movement in space, for example: rate of Rate of change in population + rate of = population growth due to dispersal reproduction density
or
∂N N ∂ 2N = rN 1 − +D 2 ∂t K ∂x
where N is population density, t is time, r is the intrinsic rate of population increase, K is the carrying capacity, D is movement by diffusion, and x is position from the origin or release point. For such simple models, mathematicians have shown that the speed of an invasion measured in terms of a new area occupied is given by
or
Kilometers/ year
rate of population rate of × ∝ growth at dispersal low density √ C = 2 rD
and will be constant after an invasion has been allowed to proceed for a sufficiently long time. The prediction of a constant rate of spread has been compared to empirical data on the movement of invading species over time, and it successfully describes the range expansion of a surprisingly diverse range of organisms, including the collared dove (Streptopelia decaocto) and the small white cabbage butterfly (Pieris rapae). A consistent result derived from the theoretical work on range expansion is that long-distance dispersal events can have an enormous effect on the
Inventory control rate of spread. Moreover, the growth of newly established satellite populations can dominate the overall increase of a species in numbers or area. The obvious implication for management is that small, seemingly insignificant patches or outlying populations of a noxious weed or pest should be the focus of control efforts. Impact. Invasive species can have several different types of impacts. First, they can affect the traits and behavior of resident organisms (for example, causing a shift in diet, size, or shape of the native species they encounter). Second, impacts can occur at the level of the population, either by changing the abundance of a native population or by changing its genetic composition. Hybridization between an invader and a closely related native can result in introgression and genetic pollution. As in the case of several endangered fish species, the endpoint can be the de facto extinction of the native species when the unique aspects of its genome are overwhelmed. Third, impacts can occur at the level of ecological communities. When individual populations are reduced or even driven extinct by competition or predation by an invasive species, the result is a decrease in the overall biodiversity of the invaded site. Finally, invaders can impact not only other species but the physical characteristics of an ecosystem as well. Shrubs that fix nitrogen with the help of symbiotic bacteria can significantly increase the nutrient pools in soil, while some deep-rooted species can dry down the water table such that native species can no longer survive. There are two main contributing factors in determining which species have the biggest impacts: abundance and special characteristics. Invaders that reach extremely high density, such as the zebra mussel at up to 4500 per square meter, simply overwhelm all other organisms. Other species have special traits that result in an impact out of proportion to their numbers. For example, the cane toad (Bufo marinus) is a noxious pest in Australia in part because its toxic skin can be lethal to potential native predators. Control. Because of the economic and conservation importance of nonindigenous species, much of invasion ecology focuses on the prevention, eradication, and control of invaders, and the restoration of sites after control. Research has emphasized the importance of early detection and eradication of problem species because of the sheer difficulty of manual and chemical control methods once populations have reached a large size. Biological control has been touted as an environmentally friendly alternative to herbicides and pesticides. One type of biological control uses native species of insects or microbes as biopesticides to turn back invasive host populations. A second type, termed classical biological control, involves the introduction of new nonindigenous species that attack the invader in its home range. Therefore, classical biological control is also accompanied by the same risks as any intentional introduction. The uncertainties due to the current poor pre-
dictive ability of the science of invasion ecology have produced an active debate on the wisdom of using introductions to stem previous introductions. See ALLELOPATHY; ECOLOGICAL COMMUNITIES; ECOLOGICAL SUCCESSION; SPECIATION; SPECIES CONCEPT. Ingrid M. Parker Bibliography. R. Hengeveld, Dynamics of Biological Invasions, Chapman and Hall, 1989; H. A. Mooney and J. A. Drake, Ecology of Biological Invasions of North America and Hawaii, SpringerVerlag, 1986; J. L. Ruesink et al., Guilty until proven innocent: Reducing the risks of non-indigenous species introductions, Bioscience, 45(7):465–477, 1995; M. Williamson, Biological Invasions, Chapman and Hall, 1996.
Inventory control The process of managing the timing and the quantities of goods to be ordered and stocked, so that demands can be met satisfactorily and economically. Inventories are accumulated commodities waiting to be used to meet anticipated demands. Inventory control policies are decision rules that focus on the tradeoff between the costs and benefits of alternative solutions to questions of when and how much to order for each different type of item. Benefits of carrying inventories. For a firm, some of the possible reasons for carrying inventories are: uncertainty about the size of future demands; uncertainty about the duration of lead time for receiving deliveries; provision for greater assurance of continuing production, using work-in-process inventories as a hedge against the failure of some of the machines feeding other machines; and speculation on future prices of commodities. Some of the other important benefits of carrying inventories are reduction of ordering costs and production setup costs (these costs are less frequently incurred as the size of the orders are made larger which in turn creates higher inventories); price discounts for ordering large quantities; shipping economies; and maintenance of stable production rates and work-force levels which otherwise could fluctuate excessively due to variations in seasonal demand. Holding costs. The benefits of carrying inventories have to be compared with the costs of holding them. Holding costs include the following elements: cost of capital for money tied up in inventories; cost of owning or renting the warehouse or other storage spaces; materials handling equipment and labor costs; costs of potential obsolescence, pilferage, and deterioration (these also involve the costs of insurance, security, and protection from natural causes such as humidity or extreme temperatures); property taxes levied on inventories; and cost of installing and operating an inventory control policy. Pareto analysis. Inventories, when listed with respect to their annual costs, tend to exhibit a similarity to Pareto’s law and distribution. A small percentage of the product lines may account for a very large
385
386
Inventory control share of the total inventory budget (they are called class A items, or sometimes the vital few). Aside from the class A items, and in the opposite direction, there exists a large percentage of product lines which tend to constitute a much smaller portion of the budget (they are called class C items). The remaining 20 to 30% of the items in the middle are called class B items. The ABC analysis may help management direct more of its attention to important issues and may lead to greater cost-effectiveness. For example, if the inventory levels are checked at fixed time intervals, then the status of type A items may be reported weekly, type B items biweekly, and type C items monthly. See PARETO’S LAW. Mathematical inventory theory. Mathematical inventory theory dates back to 1915 to the work of F. Harris of the Westinghouse Corporation, who derived the simple lot size formula. For independent items, economic order quantity (EOQ) means the replenishment quantity (Q) which has to be ordered to minimize the sum of the costs of holding (h dollars per item per period) and setups (and/or ordering costs: K dollars per order). If quantity discounts are not available, and if the demand (d items per period) is known and continues at the same rate through time, then the average cost per period, Kd/Q + hQ/2, √ is minimized by setting Q = 2Kd/h. This basic model was later extended to allow for various other conditions. For example, if the replenishment of the lot cannot take place instantaneously, but rather happens at a speed of r items√per period, then the optimal value of Q becomes 2Kd/[h(1 − d/r)]. Other extensions of the EOQ model have been made to cover cases such as: quantity discounts in purchase prices; allowance for back orders at a cost; limited resources or facilities shared by otherwise independent items; inflation; time-varying deterministic demands and production capacities; and multiple echelons. As for problems where the demands involve uncertainty, substantial research work has been carried out to develop inventory policies which minimize the expected values of various cost functions. Provided that information regarding the probability density function of demand is known, elegant mathematical solutions are available for independent items on a single echelon. There have also been some findings for certain multiitem, multiechelon problems; however, the amount of further research that needs to be carried out to find optimal solutions for complicated everyday inventory problems is substantial. Inventory control problems of everyday life involve many complications that can include various combinations of sequence-dependent setup times, multiitems, multiechelons with stochastic lead times, joint orders, dependent probabilistic demands, and situations where adequate information on probability density functions is not available. Even worse, the shape of the unknown probability density function may be time-varying. Regardless, mathematical inventory theory is valuable because by using the insight it provides to simpler problems, good heuris-
tics for more complicated everyday problems can be designed. Simulation. Computer simulation is also used for such purposes. By simulating inventory systems and by analyzing or comparing the performance of different decision policies, further insights can be acquired into the specific problem on hand and a more cost-and service-effective inventory control system can be developed. Ordering policies. Continuous-review and fixedinterval are two different modes of operation of inventory control systems. The former means the records are updated every time items are withdrawn from stock. When the inventory level drops to a critical level called reorder point (s), a replenishment order is issued. Under fixed-interval policies, the status of the inventory at each point in time does not have to be known. The review is done periodically (every t periods). Many policies for determining the quantity of replenishment use either fixed-order quantities or maximum-order levels. Under fixed-order quantities for a given product, the size of the replenishment lot is always the same (Q). Under maximum-order levels, the lot size is equal to a prespecified order level (S) minus the number of items (of that product) already in the system. Different combinations of the alternatives for timing and lot sizes yield different policies known by abbreviations such as (s, Q), (s, S), (s, t, S), and (t, S). Other variations of the form of inventory control policies include coordination of timing of replenishments to achieve joint orders, and adjustment of lot sizes to the medium of transportation. Forecasting. Uncertainties of future demand play a major role in the cost of inventories. That is why the ability to better forecast future demand can substantially reduce the inventory expenditures of a firm. Conversely, using ineffective forecasting methods can lead to excessive shortages of needed items and to high levels of unnecessary ones. Product design. Careful product design can also reduce inventory costs. Standardization, modularity, introduction of common components for different end products, and extension of the use of interchangeable parts and materials can lead to substantial savings in inventory costs. MRP system. Material requirements planning (MRP) systems (which are production-inventory scheduling softwares that make use of computerized files and data-processing equipment) have received widespread application. Material requirements planning systems have not yet made use of mathematical inventory theory. They recognize the implications of dependent demands in multiechelon manufacturing (which includes lumpy production requirements). Integrating the bills of materials, the given production requirements of end products, and the inventory records file, material requirements planning systems generate a complete list of a production-inventory schedule for parts, subassemblies, and end products, taking into account the lead-time requirements. Material requirements planning has proved to be a useful tool for manufactur-
Inverse scattering theory ers, especially in assembly operations. See MATERIAL RESOURCE PLANNING. Kanban and just-in-time (JIT). While material requirements planning systems were being developed in the United States, some Japanese manufacturers achieved widely acclaimed success with a different system. By producing components “just in time” to be used in the next step of the production process, and by extending this concept throughout the production line so that even the finished goods are delivered just in time to be sold, they obtained substantial reductions in inventories. One of the key factors for establishing just-in-time is altering the manufacturing process to drastically reduce the setup times and simplifying the ordering and procurement process so that ordering costs are cut down. The idea is to enable the producer to operate with small lot sizes, which get produced when the need arises (and not before). Once just-in-time is established, an information system is used to determine the timing and quantities of production. Card signals—that is, visible records (in Japanese, Kanban)—are used to specify withdrawals from preceding production stages, and to order for production the number and type of items required. Because small batches of production have become economical, the production orders can be filled just in time. Advocates of Kanban characterize it as a pull process and criticize material requirements planning as a push system. Though Kanban is a simple idea and yields an adaptive–flexible production system, its appropriateness hinges on whether setup and ordering costs have been drastically reduced so as to allow small production batches. Other systems. An example of application of computerized inventory control systems is IBM’s Communication Oriented Production and Informational Control Systems (COPICS), which covers a wide scope of inventory-control-related activities, including demand forecasting, materials planning, and even simulation capabilities to test different strategies. Taking into account the continuing decreases in the costs of computer hardware, the advances in computerized inventory control systems can be expected to receive even wider applications. See COMPUTER-INTEGRATED MANUFACTURING. Ali Dogramaci Bibliography. L. S. Aft, Production and Inventory Control, 1987; R. G. Brown, Materials Management Systems, 1977, reprint 1984; D. W. Fogarty, J. H. Blackstone, and T. R. Hoffmann, Production and Inventory Management, 2d ed., 1991; R. T. Lubben, Just-in-Time Manufacturing: An Elegant Solution, 1988; S. L. Narasimhan, D. N. McLeavey, and P. J. B. Billington, Production Planning and Inventory Control, 1995; R. J. Schonberger, Japanese Manufacturing Techniques: Nine Hidden Lessons in Simplicity, 1982; R. J. Schonberger, World Class Manufacturing Casebook: Implementing JIT & TQC, 1995; E. Silver and R. Petersen, Decision Systems for Inventory Management and Production Planning, 2d ed., 1995.
Inverse scattering theory A theory whose objective is to determine the scattering object, or an interaction potential energy, from the knowledge of a scattered field. This is the opposite problem from direct scattering theory, where the scattering amplitude is determined from the equations of motion, including the potential. The equations of motion are usually linear (operator-valued) equations. An example of an inverse problem was given in an article by M. Kac entitled “Can One Hear the Shape of a Drum?” For some drums this question can be answered yes, for yet others no, but for many the question remains unanswered. See SCATTERING EXPERIMENTS (ATOMS AND MOLECULES); SCATTERING EXPERIMENTS (NUCLEI). Inverse scattering theories can be divided into two types: (1) pure inverse problems, when the data consist of complete, noise-free information of the scattering amplitude; and (2) applied inverse problems, when incomplete data which are corrupted by noise are given. Many different applied inverse problems can be obtained from any pure inverse problem by using different band-limiting procedures and different noise spectra. The difficulty of determining the exact object which produced a scattering amplitude is evident. It is often a priori information about the scatterer that makes the inversion possible. Pure inverse scattering theory. Much of the basic knowledge of systems of atoms, molecules, and nuclear particles is obtained from inverse scattering studies using beams of different particles as probes. These probes include beams of charged particles utilizing electromagnetic interactions, weakly interacting particle beams such as neutrinos and antineutrinos, and beams of strongly interacting particles such as protons, neutrons, pions, kaons, and others. For the Schr¨ odinger equation with spherical symmetry or in one dimension, there is an exact solution of the inverse problem by I. M. Gel’fand and B. M. Levitan. Given complete spectral data, including (1) the energy Ei, and a normalization constant Ci for each bound state, i = 1, 2, . . . , N, and (2) a weight function w(k) for the continuous states, a formula exists to calculate a function H(r, t). From this function the Gel’fand-Levitan equation (1) r K(r, t) = −H(r, t) − K(r, s)H(t, s) ds (1) 0
can be solved for K. Then the potential V sought can be calculated from Eq. (2). Furthermore, given V (r) = 2
dK(r, r) dr
(2)
suitable discrete and continuous approximate solutions φ 0, the exact solutions are given by Eq. (3) r K(r, t)φ0 (t, k) dt (3) ψ(r, k) = φ0 (r, k) + 0
where k is the wave vector. This completely solves
387
388
Inverse scattering theory these inverse problems. See ELEMENTARY PARTICLE; INTEGRAL EQUATION; NONRELATIVISTIC QUANTUM THEORY; QUANTUM FIELD THEORY; QUANTUM MECHANICS. An application of inverse scattering to onedimensional nonlinear evolution equations was given by C. S. Gardener and colleagues. They studied one-dimensional reflectionless potentials V(x) = V(x, 0) in a time-independent linear Schr¨ odinger equation with one, two, and so forth bound states. The time independence assures that the spectrum remains invariant under translations in time, so that the corresponding flow is isospectral. Then a second linear operator was used to generate translations in time. At large spatial distances x, the regular solution to the second time-dependent equation gave a time-dependent reflection coefficient for scattering from the potential. Inverse scattering theory was then used to convert this reflection coefficient to a time-dependent potential V(x, t). The identification V(x, t) = φ(x, t) was observed to provide a solution to the nonlinear, dispersive Kortweg–de Vries equation (4), where φ t = ∂φ/∂t, φ xxx = ∂ 3φ/∂x3, and φt + 6φφx + φxxx = 0
(4)
so forth. Each bound state of the time-independent Schr¨ odinger equation corresponds to the velocity of one soliton, a localized finite-energy, solitary wave with a super stability. In ordinary stability, a solitary wave would retain its identity after a small enough perturbation; two solitons retain their size, shape, and speed with only a shift in phase even after propagating through one another. This, their superstability, requires a computer calculation to test. A number of other nonlinear equations in one space dimension and one time dimension have soliton solutions, and they are experimentally observed in plasmas, quantum optics, fluids, superconductors, and many other low-dimensional systems. See SOLITON. Applied inverse scattering theory. A number of hightechnology areas (nondestructive evaluation, medical diagnostics including acoustic and ultrasonic imaging, x-ray absorption and nuclear magnetic resonance tomography, radar scattering and geophysical exploration) use inverse scattering theory. Several classical waves including acoustic, electromagnetic, ultrasonic, x-rays, and others are used, extending the ability to “see” beyond visible light and beyond electromagnetic waves. For example, acoustic waves readily propagate through a thick metal container while scattering from imperfections in the wall. See BIOACOUSTICS, ANIMAL; COMPUTERIZED TOMOGRAPHY; MEDICAL ULTRASONIC TOMOGRAPHY; NONDESTRUCTIVE EVALUATION. All of the inverse scattering technologies require the solution to ill-posed or improperly posed problems. A model equation is well posed if it has a unique solution which depends continuously on the initial data. It is ill posed otherwise. The ill-posed problems which are amenable to analysis, called regularizable ill-posed problems, are those which depend discontinuously upon the data. This destroys unique-
ness, although solutions (in fact, many solutions) exist. Direct scattering problems are well posed and involve solutions for a scattering amplitude function T from an integral equation of the form of Eq. (5), T = V + V G0+ T
(5)
where V and G+0 are given. The pure inverse problem requires the solution of Eq. (6), with T and G+0 V = −T − T G0+ V
(6)
given. However, applied inverse problems involve general equations (7), where Y is the measured quanY =R·T
(7a)
Y =R·T +n
(7b)
tity, T is the direct scattering amplitude, n is noise, and R is a resolution function. The resolution function represents the effects of the measuring apparatus, especially the limit of the range of frequencies accepted. It is Eqs. (7) which are regularizable illposed problems. The discontinuous dependence on data has as a consequence that two given input signals Y1 and Y2 can be as close together as desired everywhere, but can lead to solutions T1 and T2 which are as far apart as desired. The nonuniqueness means that these equations have (possibly infinite) families of solutions which are near the exact solution of Eq. (5) or (6). It is necessary to have prior knowledge about the scatterer to decide which of these solutions is the best (and even what “best” means here). The regularizer is a mathematical device for choosing one of this family. The solution chosen is called a quasisolution. The regularizer both removes the discontinuity of the dependence of T on Y and selects a unique quasisolution. The cost of regularization is that the quasisolution selected must have some nonzero minimum spatial resolution. It is impossible to determine any finer details about the scatterer than this spatial resolution. This limitation arises as follows: Let Et(N) be the truncation error, that is, the spatial resolution due to truncation of the spatial solutions at N functions of some chosen set (sines and cosines in Fourier series expansions). For a wide variety of scattering problems, it behaves as Eq. (8), where a and b are positive Et =
a Nb
(8)
constants which depend upon the experiment being analyzed but not N. The error due to noise En behaves as Eq. (9), where c and d are positive constants which En = cN d
(9)
depend on the experiment but not on N. For pure inverse problems, Et is the sole source of error so that if N is large enough the error is negligible. For applied inverse problems, both errors occur so that
Invertebrate embryology
Inverse-square law
En
Any law in which a physical quantity varies with distance from a source inversely as the square of that distance. When energy is being radiated by a point source (see illus.), such a law holds, provided the space between source and receiver is filled with a nondissipative, homogeneous, isotropic, unbounded medium. All unbounded waves become spherical at distances r, which are large compared with source dimensions so that the angular intensity distribution on the expanding wave surface, whose area is proportional to r2, is fixed. Hence emerges the inverse-square law.
E
E
Eo
Et
No N Example of use of Eqs. (8)–(10) to obtain the point of optimal truncation and minimum spatial resolution in an applied inverse scattering problem.
r1
S
I1
I2
the total error is given by Eq. (10). The illustration E = aN −b + cN d
(10)
shows the truncation error Et, the noise error En; and the total error E, such as that given in Eq. (10). N0 is the point of optimal truncation. The corresponding value E0 = E(N0) is the minimum spatial resolution. The minimum of Eq. (10) is obtained from Eq. (11), so that N0 is given by Eq. (12). Since (d2E/dN2)N0 < dE = 0 = −abN0−b−1 + cdN0d−1 (11) dN N0 N0 =
ab cd
r2
Point source S emitting energy of intensity l. The inverse-square law states that l2/l2 = r22/r21.
Similar reasoning shows that the same law applies to mechanical shear waves in elastic media and to compressional sound waves. It holds statistically for particle sources in a vacuum, such as radioactive atoms, provided there are no electromagnetic fields and no mutual interactions. The term is also used for static field laws such as the law of gravitation and Coulomb’s law in electrostatics. William R. Smythe
1/(d+b) (12)
0, this is a minimum. As an example, let a = 104c and d = b = 2, which gives that N0 = 10 upon using Eq. (12). In this case, less than 10 terms will not yield all of the information on the scatterer available in the data; more than 10 will lose detail by allowing noise to pollute the signal. It is the goal of applied inverse scattering theory to understand these issues and to optimize performance for each technology. Each is different, and each serves as a rich source of challenging problems. Brian DeFacio Bibliography. M. Bertero and E. M. Pike (eds.), Inverse Problems in Scattering and Imaging, 1992; K. Chadan and P. C. Sabatier, Inverse Problems in Quantum Scattering Theory, 1977, reprint 1989; D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, 1992; G. M. Gladwell, Inverse Problems in Scattering: An Introduction, 1993; K. I. Hopcraft, An Introduction to Electromagnetic Inverse Scattering, 1992; D. Sattinger et al., Inverse Scattering and Applications, 1991.
Invertebrate embryology The study of the development or morphogenesis and growth of the invertebrates. The same general principles of development apply to the invertebrates as to the vertebrates. Actually, much of the basic knowledge of embryology has been the result of studies on invertebrates. A common phenomenon in the invertebrates is the release of a free and independent form, the larva, before development is completed. The larvae vary considerably and are characteristic of the different animal groups. Embryonic development begins with the formation of the gametes in a specialized cell bearing the haploid or N number of chromosomes. See GAMETOGENESIS. Spermatogenesis. Number, compactness, and mobility are important for spermatozoa. Toward this end, the process of spermatogenesis consists of a stage of cell proliferation, followed by a period of progressive concentration and streamlining. The essential, heredity-determining material of the chromosomes is packed tightly into a tiny nucleus. The cytoplasm forms the locomotor apparatus, usually
389
390
Invertebrate embryology nucleus acrosome
nucleolus
nucleus centriole
midpiece (mitochondrion)
germinal vesicle
(b)
(c) oil droplets
acrosome filament
flagellum acrosome
axial filament of tail (a)
nucleus midpiece tail
(d)
(e)
Fig. 1. Spermatozoa and fertilizable eggs. (a) Sea urchin spermatozoon. (b) Marine annelid (Nereis) egg, with intact germinal vesicle containing a nucleolus. Spheres surrounding germinal vesicle are oil droplets. (c) Marine mussel (Mytilus) egg, germinal vesicle broken down but polar bodies not formed. (d) Mussel (Mytilus) spermatozoa, acrosome intact. (e) Mytilus spermatozoa, acrosomal reaction.
a single long flagellum with a centriole at its base and a mitochondrion nearby, as well as an organelle (acrosome) for penetrating the egg coverings. Excess cytoplasm is finally discarded, and the mature spermatozoon (Fig. 1a), ready to take part in fertilization, is a self-contained, stripped-down unit, carrying the hereditary message in code, and provided with enough energy source to propel it in a sustained burst of activity on its one-way trip to join an egg of its species. Millions upon millions of such cells are produced in the testis, where they remain quiescent until they are spawned. See SPERM CELL; SPERMATOGENESIS. Oogenesis. The egg is designed for a very different role. It must contain enough substance to provide structural material for the formation of a set of simple organs of locomotion and digestion so that the young animal can secure food to carry on its further growth. It must also contain enough energy-yielding material to perform the work of dividing the single egg cell into body cells from which such organs can be formed and to synthesize the complex chemical substances needed to provide each of these new cells with a new nucleus. See OOGENESIS; OVUM. The egg, therefore, is specialized for large size and protection of its contents, with less concern for numbers and none at all for motility. In addition, its cytoplasm possesses intrinsic capacities for differentiation and building in exact accordance with the specifications contained in its chromosomes, so that a spider egg, for example, always produces a spider and never a fly. The fact that the physical bases of these capacities have so far eluded most of the efforts directed toward their detection in no way casts doubt on their existence. The reserve building and energy-yielding materials are stored in the egg cytoplasm as minute spheres or platelets of yolk, a stable lipoprotein substance.
Eggs are large cells even without this inert material. At the end of their growth period, when they have accumulated the full amount of yolk, they are huge in comparison to the body cells of the parent animal. No invertebrate eggs, however, achieve the spectacular dimensions of bird eggs. The largest are found among the arthropods [crayfish eggs are 0.1 in. (2.5 mm) in diameter], while some marine animals have very small eggs (oyster eggs are about 65 micrometers). During the growth period, while the egg cell is actively synthesizing yolk and increasing the amount of cytoplasm, it has a very large nucleus, the germinal vesicle (Fig. 1b). When it reaches full size, however, and this synthetic activity subsides, the nuclear membrane breaks down, releasing its contents into the cytoplasm. The two successive nuclear divisions of meiosis follow, but the cytoplasm, instead of dividing equally, pushes out one of the daughter nuclei each time as a polar body. These two minute bodies have no further function in development. The chromosome material left in the egg forms the egg pronucleus, which is ready to unite with the sperm pronucleus. The zygote nucleus, formed by their union, is comparable in size to those of the body cells. Egg polarity. Many types of eggs show structural departures from radial symmetry which indicate that the unfertilized egg is organized around a bipolar axis, one end of which is called the animal pole and the other the vegetal pole. The polar bodies are given off from the animal pole, and the egg pronucleus remains in this region. When an egg contains conspicuous amounts of yolk, it is usually concentrated in the vegetal half of the egg. Egg membranes. Since the eggs of invertebrates are often shed directly into the water of oceans and streams, or laid to develop in places where they are exposed to the drying action of air and sunlight, they are always surrounded by a protective covering. In some forms the eggs are laid in batches which may be enclosed in a leathery sac or embedded in a mass of jelly. In other cases each egg has its own separate membranous case, a layer of jelly, or a more complex system of protective structures. Sperm and egg of each individual species have been shown by light and electron microscopy to be characteristic of its particular species. Mechanisms have evolved that are not fully understood which normally prevent the egg of one species from being fertilized by the sperm of another. Factors responsible for this species specification include egg coats, surface antigens and receptors, and environmental barriers such as substrate, temperature, and variations in salinity. Preliminaries to fertilization. Sperm cells must complete all the nuclear and cytoplasmic changes and be fully mature before they can take part in fertilization. On the other hand, while this is true of the cytoplasm of egg cells, in most species the nuclear preparation for fertilization is incomplete when sperm entrance takes place. Moreover, the degree of incompleteness varies widely and apparently at random.
Invertebrate embryology The marine annelid Nereis sheds its eggs with the germinal vesicle intact (Fig. 1b). The entrance of a spermatozoon stimulates the egg to begin the reduction divisions, and the sperm pronucleus waits within the egg cytoplasm while the egg nucleus carries out its preparation for union. Sea urchin eggs are at the other extreme in this respect. Their reduction divisions are completed in the ovary, and the union of egg and sperm pronuclei follows immediately upon sperm entry. In many other species meiosis begins shortly before the eggs are spawned and stops at one stage or another of the process until the entrance of a spermatozoon sets it going again. Reproduction among the invertebrates takes place in a variety of ways which differ widely from phylum to phylum. For example, following copulation in Drosophila the spermatozoa are stored in a special part of the female reproductive tract and released a few at a time as each ovum passes down the oviduct to be laid. The spermatozoa of squids are packaged in small bundles (spermatophores). These are placed by the male, using one of its arms which is modified for the purpose, in spermatophore receptacles on the body of the female. The eggs are fertilized as they leave the oviduct and are laid singly or in fingershaped egg masses fixed to underwater objects. Most of the echinoderms and many mollusks, ascidians, annelids, and coelenterates shed their eggs in tremendous numbers into the seawater where fertilization takes place. The young larvae must usually fend for themselves. In these same groups, however, some species shelter the young in seawater-containing chambers and pockets which are not actually within the body of the parent animal. This is also the case with arthropods such as crabs, which carry the larvae in masses fixed to the abdomen of the female, and some bivalves in which the young develop for a time within the mantle chamber. But whether the fertilized eggs are thus protected by the parent animal, laid in jelly masses or leathery cases or carefully constructed brood cells (bees, hunting wasps), or simply thrown into the water, each egg is an independent unit capable of developing into an adult without any further contribution from the parents. Fertilization. The course of events of fertilization has been studied in several invertebrates, but especially in the sea urchin. Egg and sperm of the sea urchin are released into the seawater. The eggs are covered with a jelly coat to which a receptor on the plasma membrane of the fertilizing sperm binds. The plasma and outer acrosomal membranes of the sperm break down and fuse with each other as a Ca2+ influx occurs; the hydrolytic enzymes within the acrosome are released to lyse the egg coat. Next the inner acrosomal membrane everts by the polymerization beneath it of actin, and forms the acrosomal process which makes contact and fuses with the egg plasma membrane. The egg responds to the sperm by forming a fertilization cone. The sperm nucleus enters the egg, and its DNA swells to form the male pronucleus.
391
As the sperm binds to the receptors on the egg plasma membrane, the electrical potential of the egg membrane changes and establishes a rapid block to polyspermy. This prevents further sperm from making contact and fusing with the egg. With sperm-egg membrane fusion, Ca2+ is released to activate a series of changes in the egg. Oxygen consumption changes, a mitotic factor is released, and cortical granule release is initiated. The cortical granules are compact granules beneath the surface of the egg. At their release (Fig. 2), the membranes of egg and cortical granules fuse, and the contents of the granules are released into the perivitelline space surrounding the egg. One component of the granules fuses with the vitelline membrane, becomes strengthened, and lifts off the egg to become the fertilization membrane. A second component of the granules takes up Ca2+ and forms the hyaline layer close to the egg surface. As changes occur at the egg surface, the egg pronucleus and the sperm pronucleus with associated astral rays move toward the center of the egg, where they fuse. vitelline membrane plasma membrane hemispheric globules cortical granules dark body
(a)
vitelline membrane plasma membrane hemispheric globules cortical granules dark body
(b)
vitelline membrane dark body
hemispheric globules
(c)
fertilization membrane perivitelline space hyaline layer
(d) Fig. 2. Formation of the fertilization membrane in the sea urchin egg. (a) Surface of the unfertilized egg. (b) Explosion of the cortical granules: the vitelline membrane begins to be lifted up while the dark bodies are extruded, and the egg plasma membrane has become continuous with the membrane bounding the cortical granules. (c) The dark bodies have joined the vitelline membrane; the hemispheric globules begin to build up a layer over the new egg surface; this will then become the hyaline layer as indicated in d. (d) The dark bodies have become fused with the vitelline membrane, thus giving rise to the definitive fertilization membrane.
392
(a)
Invertebrate embryology
(b) (g)
(f) blastocoele
blastocoele
(e) 100 µm
(c)
(d)
blastocoele 100 µm (h)
Fig. 3. Symmetry of cleavage patterns and invertebrate blastulae. (a, b) Spiral cleavage. (c) Bilateral cleavage. (d) Radial cleavage. (e) Irregular cleavage. (f) Sea urchin blastula. Relatively uniformly sized blastomeres and large blastocoele. (g) Annelid blastula. Blastomeres at vegetal side large, yolky; small blastocoele. (h) Squid blastula. Blastomeres in animal pole region only; slitlike or no blastocoele.
The union of the two pronuclei (syngamy) marks the completion of the fertilization process. The fusion forms the zygote nucleus, with the full complement of chromosomes, and the dormant egg cell has been aroused to start the series of changes which will produce a new sea urchin (Fig. 3). A number of the events which have just been described are peculiar to sea urchins. With different time schedules and allowance for the individual characteristics of each species, however, these basic processes of sperm entry, aster formation, and syngamy make up the complex phenomenon of the fertilization reaction as it occurs in all animals. Such a descriptive presentation suggests many questions about what is actually going on in terms of cellular actions and reactions. One such question, for example, concerns the nature of the contact between spermatozoon and egg surface. It is known that in many invertebrates the acrosome at the anterior tip of the sperm cell undergoes an explosive reaction at the egg surface which transforms it into a very slender filament (Fig. 1d and e). In at least some cases an enzymelike substance is also released from the acrosome which has a dissolving effect on the egg membrane. There is evidence that the penetration of the egg surface by this filament activates the egg mechanically. Whether it is also useful in drawing the spermatozoon inside the egg cytoplasm has yet to be proved. The nature of the specificity which ensures that eggs will be entered only by spermatozoa of their own species has also been the object of a great deal of research and has not received a thoroughly satisfactory explanation. The mechanism by which an egg that has received one spermatozoon rejects all later arrivals is another problem that resists solution, but perhaps more difficult to discover than the answers to any of these questions are those concerning the cytoplasmic differences between unfertilized and fertilized eggs. See FERTILIZATION (ANIMAL).
Cleavage. The fertilized egg, or zygote, sets about at once to divide the huge mass of the egg into many small cells in order to restore the usual ratio between the amounts of nuclear and cytoplasmic substances. The energy for these repeated mitoses comes from the yolk, which also furnishes at least part of the materials required for synthesis of new nuclear structures. During this cleavage period, which commonly occurs during the first 12 h after fertilization, the blastomeres, as the cleavage stage cells are called, divide more or less synchronously. Generally, cleavage follows one of several patterns characteristic for large groups of animals and often correlated with the amount and mode of distribution of the yolk. Whatever cleavage pattern is followed, the plane of the first cleavage passes through the animal pole. When the vegetal region contains a large proportion of yolk, cleavage is retarded in this area and the blastomeres tend to be larger than in the animal pole region. Small eggs, which contain little yolk, divide completely and usually very regularly, forming a mass of cells that shows spiral (mollusks, Fig. 3a and b), bilateral (ascidians, Fig. 3c), or radial symmetry (echinoderms, Fig. 3d). Some coelenterates, however, cleave into what appear to be random masses of cells (Fig. 3e). The very large eggs of squid contain a great deal of yolk concentrated at the vegetal pole. The cleavage furrows do not cut all the way through this part but restrict their activity to the living cytoplasm at the animal pole. Insect eggs also contain a large store of yolk, which occupies the center of the elongate cells and is surrounded by a thin layer of living cytoplasm containing the egg pronucleus. Following fertilization, the nuclei alone divide and move apart in the layer of cytoplasm after each division so that they distribute themselves all around the egg. After nine such nuclear divisions have taken place (producing
Invertebrate embryology 512 nuclei), the cytoplasm also cleaves at the next division, forming a single layer composed of about 1000 cells surrounding the central yolk mass. See CLEAVAGE (DEVELOPMENTAL BIOLOGY). Blastula stage. Among all the invertebrate forms except the insects, the result of 6–10 successive cleavage cycles is the formation of a sphere (blastula) composed of small cells which lie in a single compact layer around a central cavity (blastocoele). If the egg has contained relatively little yolk, the blastocoele is rather large (Fig. 3f), while it may be very small if the egg includes much yolk (Fig. 3g) and is little more than a slit in the squid blastula (Fig. 3h). See BLASTULATION. Gastrula stage. The end of the brief blastula stage occurs when the process of gastrulation begins. In its simplest form, this consists in an indenting (invagination) of the blastula wall in the vegetal region (Fig. 4a). Meanwhile cell division is going on steadily, and since the larva has as yet no way of taking in solid food from the outside, all the form changes which occur during this period are accomplished with the material originally present in the fertilized egg. The only addition is water (blastocoele fluid) and such dissolved substances, mostly salts, from the environment as can enter through the cell membranes. As the blastomeres become smaller and the blastular wall becomes correspondingly thinner, cells are provided to extend the vegetal indentation into a pocket (Fig. 4b). With the appearance of this structure (primitive digestive tract) the larva becomes two-layered, possessing an outer layer, the ectoderm, which will later produce the nervous system as well as the outermost body covering, and an inner layer, the endoderm, from which will be formed the lining of the functional digestive tract and its associated organs and glands. As the primitive digestive tract extends into the blastocoele, its opening to the outside becomes smaller and is known as the blastopore. A modification of this process of endoderm formation occurs among some species having large, yolkfilled vegetal blastomeres (Fig. 4c). The small, actively dividing cells of the animal pole region spread down to cover these more inert blastomeres (Fig. 4d), which become the endoderm and later form the digestive organs, while the overlying ectoderm leaves a small opening in the vegetal region which corresponds to the blastopore (Fig. 4e). See GASTRULATION. Mesoderm formation. At this time the first few cells belonging to a third body layer, the mesoderm, make their appearance by slipping from the ectoderm layer into the blastocoele. These early mesoderm cells are of a primitive sort (mesenchyme), possessing pseudopodia and often moving about freely between the ectoderm and endoderm. In sponges and coelenterates, no more highly organized middle layer is formed even in adult animals, but in the other phyla the “true” mesoderm is endodermal in origin, either being formed by successive divisions of a cell which originally belonged to the endoderm (Fig. 4f and g), as in annelids and mollusks, or separating off from
100 µm mesoderm
blastopore mesenchyme (mesoderm)
animal pole
mesoderm
ectoderm endoderm primitive digestive tract vegetal pole (a) (b)
endoderm
100 µm
animal pole
blastopore
ectoderm mesoderm (c)
vegetal pole
(d) blastopore
mesoderm (e) blastopore
mesoderm
endoderm ectoderm
100 µm ectoderm
mesoderm (f) primitive digestive tract
endoderm
(g) primitive digestive tract
notochord mesoderm ectoderm endoderm
(h)
(i)
Fig. 4. Gastrulation and larval mesoderm formation. (a) Early and (b) later stage in nonyolky eggs. (c) Late blastula, (d) early gastrula, and (e) late gastrula in yolky egg of the snail. (f) Mesoderm formation in the limpet (Patella) shown in sections through center of blastula and (g) trochophore larva. (h) Cross section of Branchiostoma embryo immediately following gastrulation and (i) somewhat later.
the primitive digestive tract, as in Branchiostoma (Amphioxus) [Fig. 4h and i]. In either case this mesodermal tissue spreads out between the ectoderm and endoderm, and in all phyla more advanced than the flatworms it splits through its center into an inner and an outer layer. The cavity thus formed within the mesoderm is the true body cavity in which the various internal organs lie. The outer layer of mesoderm becomes closely applied to the inner side of the ectoderm, forming body-wall muscles and other supporting layers, while the inner layer of mesoderm surrounds the endoderm with layers of muscle. The organs of circulation, excretion, and reproduction, as well as all muscles and connective tissue, are eventually formed
393
394
Invertebrate embryology from this mesodermal layer which surrounds the endoderm. Later development. So far it is possible to summarize the development of invertebrate animals as a group but beyond this point each subgroup follows its own course, and these are so widely divergent that every one must be considered separately. Meaningful generalizations are not even possible within a single class in some cases, as attested to by the various modes of development occurring among the Insecta, some of which proceed directly from egg to adult form, while others go through an elaborate series of changes. See INSECT PHYSIOLOGY; INSECTA. In very many species there is a sharp break in the life history when the larva, after passing through a number of morphological phases which lead from one to the next with a steady increase in size and complexity, abruptly forms a whole new set of rudimentary adult organs which take over the vital functions. This metamorphosis represents the end of the larval period. The tiny animal which it produces is for the first time recognizable as the offspring of its parents. For more or less arbitrary reasons, the developmental processes of certain invertebrate forms have been studied very carefully so that their life histories are fully known. A few of these will be outlined in the following sections. Molluscan Development The eggs of Mytilus, the common mussel, are fertilizable just after the germinal vesicle breaks down (Fig. 1c). As the first polar body is given off from the animal pole, the vegetal surface of the egg bulges out to form the so-called polar lobe (Fig. 5a–d). The bulge disappears shortly, but will reappear at the time of the second polar body formation. Cleavage. When the egg cleaves, the vegetal cytoplasm is segregated into a more extreme polar lobe (Fig. 5e and f) and the cleavage furrow divides the remaining material equally between two blastomeres. The constriction forming the polar lobe disappears, returning the polar lobe material to one of the blastomeres (Fig. 5g and h). The vegetal material is again segregated at the second cleavage and again mixed with one of the four blastomeres. It is characteristic of this type of cleavage that the mitotic spindles lie aslant in the blastomeres and, moreover, regularly change the direction of their slant by 90◦ at each division so that a spiral pattern of blastomeres results. Such spiral cleavage is found in the mollusks and in the flat, round, and segmented worms. In modified form it is also found in the crustaceans. Since the animal-vegetal axis is easy to recognize in such eggs, it has been possible to record the course of cleavage very accurately and to determine the role of particular blastomeres in normally developing embryos. The four-cell stage blastomere containing the polar lobe material is designated as D, and proceeding in a clockwise direction, the others become A, B, and C (Fig. 5i). At the third cleavage, these di-
(a)
(b)
(c)
(d)
(e)
(f)
(h)
(g)
50 µm 1a
B C
A D (i)
1B
1D (j)
2b 2B
1C
1A 1d
1b
1c
2D (k)
2d
Fig. 5. Maturation and early cleavage in the mussel (Mytilus). (a) First polar body being formed at animal pole; polar lobe at vegetal pole. (b) First polar body completely extruded; polar lobe withdrawn. (c, d) Second polar body formation. (e–g) First cleavage. (h) Two-cell stage. (i) Four-cell stage. (j) Eight-cell stage; first quartet of micromeres (1a, 1b, 1c, and 1d). (k) Sixteen-cell stage with three quartets of micromeres.
vide very unequally (Fig. 5i and k) into four large vegetal macromeres (1A, 1B, 1C, and 1D) and four micromeres at the animal side (1a, 1b, 1c, and 1d). See CELL LINEAGE. Blastula stage. After two more such unequal divisions the resulting 28 micromeres have formed a hollow blastula with the four macromeres, 3A, 3B, 3C, and 3D, at its vegetal side. These then extend into the blastular cavity where their descendants will form the digestive tract, except for one of the D daughter cells produced at the next cleavage, 4d, which is set aside as the mesoderm mother cell (Fig. 4f). Trochophore stage. During the succeeding cleavages some of the cells develop cilia, the blastular symmetry becomes bilateral instead of radial, and the micromeres extend down almost to the vegetal pole, thus covering the macromeres except at the small opening of the blastopore. After 24 h of development the cilia are organized into an encircling girdle and an apical tuft at the animal pole, and the larva, now called a trochophore, begins its freeswimming stage (Fig. 6a). The blastopore is shifted forward by the faster proliferation of the ectodermal cells of the other side (Fig. 6a) and then closed, but the larval mouth is later formed at this place. Behind
Invertebrate embryology apical region micromere mouth foot
stomach macromere blastopore 100 µm
(a)
mesoderm
shell gland
intestine
blastopore (b)
(c)
100 µm
animal pole visceral mass
shell
mantle fold
anus
50 µm
mantle fold (d)
anus
velum mouth
foot
ectoderm
animal pole
shell (e)
foot
velum
mouth 100 µm
(g)
(f) blastocoele vegetal pole
endoderm mesodermforming cells vegetal pole
Fig. 6. Stages in the development of Patella and gastrulation in a tunicate. (a) Late blastula of Patella; macromeres being covered by micromeres, and blastopore at vegetal pole. (b) Early trochophore of Patella; blastopore shifted forward. (c) Matured trochophore larva of Patella. (d) Veliger larva of Patella. (e) Late veliger stage of Patella. Twisting of body has carried mantle fold and position of anus forward and turned shell about. (f) Late blastula of tunicate embryo. (g) Beginning of gastrulation in tunicate embryo.
it the endoderm forms a stomach, and a narrow tube gradually extends from this to make the intestine. The anus forms later at the place where the intestine reaches the ectoderm. At this stage a group of ectodermal cells is forming the shell gland which will secrete the shell (Fig. 6c). Two small protuberances will unite and develop into the foot, and a pair of elongated pits beside the mouth will form the balancing organs. The 4d blastomere has cleaved into two cells located on either side of the mouth which are giving rise, at this stage, to two rows of mesoderm cells called the mesodermal bands. Veliger stage. Within a week the shell gland has grown and begun to secrete the shell, and the foot is projecting prominently. The stomach increases in size and bulges into the shell cavity, and cells from the ends of the mesodermal bands form muscular attachments for the stomach and esophagus. The girdle of ciliated cells (velum) enlarges, and the rudiments of a nervous system, including eye cups, appear near the apical tuft. The larva is now called a veliger (Fig. 6d). Metamorphosis. Following further development (especially of the alimentary tract, which becomes U-shaped with the mouth and anus separated from each other only by the foot), there is a sudden period of unequal growth in the two sides of the larva so that the anus is moved around to open on its neck (Fig. 6e). Eyes and tentacles have already been formed, and finally the young animal discards its
cilium-bearing velum and takes up the adult habit of creeping about on its foot. Sea Urchin Development In sea urchins, the first and second cleavage planes divide the fertilized egg (Fig. 2) into four equal blastomeres, intersecting each other at right angles through the animal-vegetal axis. The third cleavage cuts through these four blastomeres horizontally. The fourth cleavage plane divides the upper four cells into eight equal-sized mesomeres, and the lower group into four very small micromeres at the vegetal pole and four large macromeres (Fig. 7). These 16 blastomeres are each divided about equally at the fifth cleavage, forming 32 cells, but the eight micromeres fail to divide at the sixth cleavage, giving 56 instead of 64 blastomeres at this stage. By removing certain of these blastomeres (such as the micromeres) and following later development, it has been found that the eight micromeres of 56-cell stage give rise to the first group of mesenchyme cells, the ring of eight cells just above them produces mesenchyme and endoderm, and all other blastomeres form ectoderm. Blastulation. The blastomeres continue to divide successively, forming the hollow sphere of the blastula stage (Fig. 3f). By the tenth cleavage, each of the thousand or so blastomeres has developed a cilium, and the blastula has also secreted enough hatching enzyme to dissolve the fertilization membrane. After about 12 h of development at 72◦F (20◦C),
395
396
Invertebrate embryology
(c)
(b)
(a)
macromere
(d)
100 µm
(e)
(f ) micromere
secondary mesenchyme
ectoderm
(h)
(g) primitive digestive tract
primary mesenchyme secondary mesenchyme
endoderm
Tunicate Development primitive digestive tract
mouth spicule
(i) blastopore
(j) primary mesenchyme 100 µm
As this deepens, the primary mesenchyme cells are building calcareous spicules with calcium taken from the seawater. These skeletal rods extend in three directions from two points beside the blastopore; as they lengthen they determine the characteristic shapes of the larval stages (Fig. 7j). Pluteus stage. In the pluteus stage (Fig. 8a and b), a mouth opening is formed where the tip of the primitive digestive tract joins the body wall. The tract begins to function, and the blastopore is changed into an anus as the larva is first able to take in food from outside. Metamorphosis. During the month which the larva spends as a pluteus, the body increases markedly in size and two more pairs of arms are added. In preparation for metamorphosis, a structure called the echinus rudiment forms the beginnings of the adult organ systems. Within a relatively short time most of the larval body is incorporated into these organs, which are recognizable as those of a young sea urchin. Its metamorphosis is complete when it casts off the unusable parts of its larval skeleton (Fig. 9).
The fact that certain structures, characteristic of vertebrates and found in no other invertebrates, appear during the larval life of the tunicates forms the basis for giving these otherwise unprepossessing animals their high status at the top of the invertebrate subkingdom and makes them especially interesting from the evolutionary aspect. Fertilization. The eggs of the tunicate Styela begin meiosis as they are laid, going as far as the metaphase
blastopore ciliated band
Fig. 7. Cleavage and gastrulation in the sea urchin. (a) Zygote. (b) Two-cell stage. (c) Four-cell stage. (d) Eight-cell stage. (e) Sixteen-cell stage. (f) Thirty-two–cell stage. (g) Vegetal region of blastula flattened; primary mesenchymal cells in blastocoele. (h) Beginning of invagination; secondary mesenchymal cells appearing in blastocoele. (i) Primitive digestive tract deepening; beginning of spicules. (j) Late gastrula, from side.
the larva begins its free-swimming period. Gastrulation. Shortly afterward the larva elongates somewhat toward the animal pole, where a tuft of long, immobile cilia appears, and flattens on the vegetal side. Just before invagination begins, the cells descended from the eight micromeres slip out of the vegetal blastular wall into the blastocoele (Fig. 7g), forming the primary mesenchyme cells. The gastrula stage begins about 20 h after fertilization when the center of the vegetal wall bulges inward (Fig. 7h). Most of the cells which form this pocket will become endoderm, but there are also some mesenchymal cells among them (secondary mesenchyme) which will develop pseudopodia that stretch across the blastocoele and, making contact with the opposite body wall, direct the inward expansion of the primitive digestive tract (Fig. 7i).
stomach
mouth
mouth
stomach anus (a)
anterior
spicule
(b)
ectoderm
tail
anus 100 µm 100 µm
endoderm nerve tube
neural plate notochord blastopore
100 µm posterior (c)
mesoderm mesenchyme anus brain vesicle
intestine eye
mouth (d)
balancing organ fixing processes
Fig. 8. Larval stages. (a) Ventral and (b) side view of pluteus larva of sea urchin. (c) Section through tunicate gastrula. (d) Tadpole stage of tunicate larva.
Involute
tube feet
spines
pedicellaria (grasping organ) Fig. 9. Young echinoid Peronella japonica, after metamorphosis. (After K. Okazaki and J. C. Dan, Metamorphosis of partial larvae of Peronella japonica Morteusen, a sand dollar, Biol. Bull., 106:83–99, 1954)
of the first reduction division where they stop until they are fertilized. The spermatozoon penetrates the thick chorion, enters the egg at the vegetal pole, and stimulates it to proceed with meiosis. While the polar bodies are being given off, cytoplasmic streaming segregates the cell components into a yellow-pigmented region, a clear yolk-free region, and a gray yolky mass. It is possible to recognize these differently colored materials later in development and determine the role of each in body layer formation. Cleavage. The first cleavage divides the egg into similar blastomeres. Because of the arrangement of
brain vesicle
mouth
debris of tail
nerve center anus
stomach
intestine
gill slits mouth (a)
the colored cytoplasm, its bilateral symmetry is already visible. The 16-cell stage consists of two layers of eight cells each (Fig. 3d), with the yellow cytoplasm contained in four of the vegetal cells. At the stage with about 40 cells a blastula is formed. The prospective ectoderm making up its animal side consists of thick columnar cells, while the future endoderm cells at the vegetal side are relatively flat (Fig. 6f). This difference is reversed before gastrulation begins (Fig. 6g). Gastrulation. The gastrula is formed by the movement into the blastocoele of the vegetal-side cells, followed by an overlapping growth of the prospective ectoderm. Within this enveloping layer the yellow cells produce mesoderm; the other vegetal cells form endoderm. As the gastrula develops, the surface layer anterior to the blastopore (Fig. 8c) forms neural tissue which is organized into a brain and spinal cord, while the mesoderm beneath it forms a notochord, a precursor of the vertebral column characteristic of vertebrate animals. This notochord elongates as the axis of a tail, and the larva hatches from its chorion and begins a free-swimming stage. Tadpole stage. During this stage (Fig. 8d), the tadpole acquires an extensive but nonfunctional digestive tract, two pairs of gill slits (also characteristic of vertebrates), a “cerebral eye,” and a balancing organ. At its anterior end it has three papillae with which it will fix itself to a substratum when its short tadpole stage ends. Metamorphosis. When metamorphosis begins (Fig. 10), the tail ectoderm contracts strongly, bending and breaking up the notochord, nerve cord, and tail muscles which are consumed by phagocytes. The “chin” region next to the organ of fixation elongates greatly, carrying the mouth upward. A new nervous system replaces the larval one. The intestine elongates, fuses with the ectoderm to open as an anus, and forms a stomach and a liverlike gland. A circulatory system which was started during the tadpole stage develops a muscular heart. Four new pairs of gill slits open into the pharynx. These later divide and give rise to further rows of smaller slits. The reproductive organs are formed from two masses of mesoderm cells lying near the pharynx. These develop into an ovary and a testis. Gertrude Hinsch
fixing process
esophagus stomach
100 µm (b)
Fig. 10. Metamorphosis in tunicate. (a) Beginning of metamorphosis. (b) Young adult.
Involute A term applied to a curve C that cuts at right angles all tangents of a survey C (see illus.). Each curve C has infinitely many involutes and the distance between corresponding points of any two involutes is constant. If xi = xi(s), with i = 1, 2, 3, are parametric equations of C, with parameter s arc length on C, all involutes C of C have parametric equations Xi = xi(s) + (k − s)xi (s), with i = 1, 2, 3, where xi = dxi(s)/ds, with i = 1, 2, 3, and k denotes an arbitrary constant. Let a length of string be coincident with a curve C, with one end fastened at a point P0 of C. If the string is unwound, remaining taut, the other end
397
398
Iodine Some important properties of iodine
C
C′
P0 An involute C of a curve C.
of the string traces an involute C of C. By varying the length of the string, all involutes of C are obtained. See ANALYTIC GEOMETRY. Leonard M. Blumenthal
Iodine A nonmetallic element, symbol I, atomic number 53, relative atomic mass 126.9045, the heaviest of the naturally occurring halogens. Under normal conditions iodine is a black, lustrous, volatile solid; it is named after its violet vapor. See HALOGEN ELEMENTS; PERIODIC TABLE. 1 1 H 3 Li 11 Na 19 K 37 Rb 55 Cs 87 Fr
2 4 Be 12 Mg 20 Ca 38 Sr 56 Ba 88 Ra
3 21 Sc 39 Y 71 Lu 103 Lr
4 22 Ti 40 Zr 72 Hf 104 Rf
lanthanide series actinide series
5 23 V 41 Nb 73 Ta 105 Db
6 24 Cr 42 Mo 74 W 106 Sg
7 25 Mn 43 Tc 75 Re 107 Bh
8 26 Fe 44 Ru 76 Os 108 Hs
9 27 Co 45 Rh 77 Ir 109 Mt
10 28 Ni 46 Pd 78 Pt 110 Ds
11 29 Cu 47 Ag 79 Au 111 Rg
12 30 Zn 48 Cd 80 Hg 112
13 5 B 13 Al 31 Ga 49 In 81 Tl 113
14 6 C 14 Si 32 Ge 50 Sn 82 Pb
15 16 7 8 N O 15 16 P S 33 34 As Se 51 52 Sb Te 83 84 Bi Po
57 58 59 60 61 62 63 64 65 La Ce Pr Nd Pm Sm Eu Gd Tb
66 67 Dy Ho
89 Ac
98 Cf
90 Th
91 Pa
92 93 94 95 96 97 U Np Pu Am Cm Bk
18 2 17 He 9 10 F Ne 17 18 Cl Ar 35 36 Br Kr 53 54 I Xe 85 86 At Rn
68 69 70 Er Tm Yb
99 100 101 102 Es Fm Md No
The chemistry of iodine, like that of the other halogens, is dominated by the facility with which the atom acquires an electron to form either the iodide ion I− or a single covalent bond —I, and by the formation, with more electronegative elements, of compounds in which the formal oxidation state of iodine is +1, +3, +5, or +7. Iodine is more electropositive than the other halogens, and its properties are modulated by: the relative weakness of covalent bonds between iodine and more electropositive elements; the large sizes of the iodine atom and iodide ion, which reduce lattice and solvation enthalpies for iodides while increasing the importance of van der Waals forces in iodine compounds; and the relative ease
Property
Value
Electronic configuration Relative atomic mass Electronegativity (Pauling scale) Electron affinity, eV lonization potential, eV Covalent radius,—I, nm Ionic radius, I−, nm ◦ Boiling point, C ◦ Melting point, C Specific gravity (20/4)
[Kr]4d105s25p5 126.9045 2.66 3.13 10.451 0.133 0.212 184.35 113.5 4.940
with which iodine is oxidized, Some properties of iodine are listed in the table. See ASTATINE; BROMINE; CHEMICAL BONDING; CHLORINE; FLUORINE. Iodine occurs widely, although rarely in high concentration and never in elemental form. Despite the low concentration of iodine in sea water, certain species of seaweed can extract and accumulate the element. In the form of calcium iodate, iodine is found in the caliche beds in Chile. Iodine also occurs as iodide ion in some oil well brines in California, Michigan, and Japan. The sole stable isotope of iodine is 127I (53 protons, 74 neutrons). Of the 22 artificial isotopes (masses between 117 and 139), the most important is 131I, with a half-life of 8 days. It is widely used in radioactive tracer work and certain radiotherapy procedures. See RADIOACTIVE TRACER. Iodine exists as diatomic I2 molecules in solid, liquid, and vapor phases, although at elevated temperatures (>200◦C or 390◦F) dissociation into atoms is appreciable. Short intermolecular I. . . I distances in the crystalline solid indicate strong inter-molecular van der Waals forces. Iodine is moderately soluble in nonpolar liquids, and the violet color of the solutions suggests that I2 molecules are present, as in iodine vapor. Although it is usually less vigorous in its reactions than the other halogens, iodine combines directly with most elements. Important exceptions are the noble gases, carbon, nitrogen, and some noble metals. The inorganic derivatives of iodine may be grouped into three classes of compounds: those with more electropositive elements, that is, iodides; those with other halogens; and those with oxygen. Organoiodine compounds fall into two categories: the iodides; and the derivatives in which iodine is in a formal positive oxidation state by virtue of bonding to another, more electronegative element. See GRIGNARD REACTION; HALOGENATED HYDROCARBON; HALOGENATION. Iodine appears to be a trace element essential to animal and vegetable life. Iodide and iodate in sea water enter into the metabolic cycle of most marine flora and fauna, while in the higher mammals iodine is concentrated in the thyroid gland, being converted there to iodinated amino acids (chiefly thyroxine and iodotyrosines). They are stored in the thyroid as thyroglobulin, and thyroxine is apparently secreted by the gland. Iodine
Ion beam mixing deficiency in mammals leads to goiter, a condition in which the thyroid gland becomes enlarged. See THYROID GLAND. The bactericidal properties of iodine and its compounds bolster their major uses, whether for treatment of wounds or sterilization of drinking water. Also, iodine compounds are used to treat certain thyroid and heart conditions, as a dietary supplement (in the form of iodized salt), and for x-ray contrast media. See ANTIMICROBIAL AGENTS; ANTISEPTIC; SALT (FOOD). Major industrial uses are in photography, where silver iodide is a constituent of fast photographic film emulsions, and in the dye industry, where iodinecontaining dyes are produced for food processing and for color photography. See DYE; PHOTOGRAPHIC MATERIALS. Chris Adams Bibliography. F. A. Cotton et al., Advanced Inorganic Chemistry, 6th ed., Wiley-Interscience, 1999; A. Varvoglis, The Organic Chemistry of Polycoordinated Iodine, 1992; T. Wirth (ed.), Hypervalent Iodine Chemistry, 2003.
nitrogen atom and three oxygen atoms and carries a single negative charge. Polyatomic ions are usually depicted inside brackets with superscripted charges, as shown in the structure below. –
O N O
O
Nitrate ion
Anions and cations can combine to form solid materials called salts, which are named by the cation name followed by the anion name. For a salt composed of the polyatomic ions ammonium and nitrate, the formula is NH4NO3 and the name is ammonium nitrate. For monoatomic ions, the cation name is the same as the element and the anion name is the element name with the ending -ide. Thus, common table salt, NaCl, is called sodium chloride. The ratio of anions to cations must always be such that an electrically neutral material is produced. Thus, magnesium nitrate must contain one magnesium for every two nitrates, giving the formula Mg(NO3)2. See SALT (CHEMISTRY). H. Holden Thorp
Ion An atom or group of atoms that bears an electric charge. Positively charged ions are called cations, and negatively charged ions are called anions. When a single atom gains or loses an electron, monoatomic ions are formed. For example, reaction of the element sodium (Na) with the element chlorine (Cl), which occurs as the diatomic gas Cl2, leads to the transfer of electrons from Na to Cl to form Na+ cations and Cl− anions, as shown in the reaction below. 2Na + Cl2 → 2Na+ + 2Cl−
Removal of an electron from an atom generates a species that has one more proton than electrons. For example, a sodium atom contains 11 negatively charged electrons and 11 positively charged protons. If one electron is removed, there will be 10 electrons and 11 protons, generating a positive charge on the sodium ion. Likewise, addition of an electron to an atom generates a species with more electrons than protons, which is therefore a negatively charged anion. In general, atoms of metallic elements (on the left side of the periodic table) lose electrons to form cations, while atoms of nonmetallic atoms (on the right side of the periodic table) gain electrons to form anions. Ions can bear multiple charges, as in the magnesium ion (Mg2+) or the nitride ion (N3−). The charge on monoatomic ions is usually the same for elements in the same column of the periodic table; for example, hydrogen (H), Na, lithium (Li), potassium (K), rubidium (Rb), and cesium (Cs) all form +1 ions. See PERIODIC TABLE. Ions can also comprise more than one atom and are then called polyatomic ions. For example, the ammonium ion (NH4+) carries a positive charge and is composed of one nitrogen atom and four hydrogen atoms. The nitrate ion (NO3−) is composed of one
Ion beam mixing A process in which bombardment of a solid with a beam of energetic ions causes the intermixing of the atoms of two separate phases originally present in the near-surface region. In the well-established process of ion implantation, the ions are incident instead on a homogeneous solid, into which they are incorporated over a range of depths determined by their initial energy. In the simplest example of ion beam mixing, the solid is a composite consisting of a substrate and a thin film of a different material (Fig. 1a). Ions with sufficient energy pass through the film into the substrate, and this causes mixing of the film and substrate atoms through atomic collision processes (Fig. 1b). If the ion dose is large enough, the original film will completely disappear (Fig. 1c). This process may result in the impurity doping of the substrate, in the formation of an alloy or two-phase mixture, or in the production of a stable or metastable solid phase that is different from either the film or the substrate. See ION IMPLANTATION. Like ion implantation, ion beam mixing is a solidstate process that permits controlled change in the composition and properties of the near-surface region of solids. Although still being exploited experimentally and not yet employed commercially, it is expected to be useful for such applications as surface modification of metals and semiconductor device processing. Along with thin-film deposition technology, ion beam mixing should make it possible to introduce many impurity elements at concentrations that would be too high for ion implantation to be practical. Mixing mechanisms. The ion beam mixing of a film and substrate takes place by the following
399
400
Ion beam mixing
substrate
(a) film
ion beam
(b) intermixed region
ion beam
(c) Fig. 1. Ion beam mixing of film and substrate. (a) Before ion bombardment. (b) Partial intermixing. (c) Complete intermixing.
mechanisms, in increasing order of complexity: recoil implantation, cascade mixing, and radiationenhanced diffusion. In recoil implantation, an atom is driven from the film into the substrate as a result of a direct collision with an incident ion. Cascade mixing takes place when the recoil of an atom from such a collision initiates a series of secondary collisions among the film and substrate atoms. These collisions produce both forward and backward displacements and therefore lead to transfer of atoms from the substrate into the film as well as from the film into the substrate. Both recoil implantation and cascade mixing are kinematic mechanisms whose effectiveness generally increases with the mass, energy, and charge state of the incident ions and with the masses of the film and substrate atoms but is relatively insensitive to temperature. These phenomena usually yield a ratio of intermixed atoms to incident ions no greater than about 10 to 1. Since they are coincident with the atomic collisions, they occur in only about 10−11 s, roughly the transit time of the incident ions through the film of about 100 nm in thickness. Mixing of the film and substrate atoms by the mechanism of radiation-enhanced diffusion requires much longer times, and the rate is strongly temperaturedependent because diffusion is a thermally activated process. This mechanism is a consequence of the
increase in interdiffusion coefficients that results because additional lattice defects (of which the simplest are vacancies and interstitial atoms) are formed by the atomic displacements produced by ion bombardment. Intermixing of several hundred atoms per incident ion can be achieved by heating a bombarded sample to temperatures where the defects become sufficiently mobile for rapid diffusion. See ION-SOLID INTERACTIONS; RADIATION DAMAGE TO MATERIALS. Stable- and metastable-phase formation. When a sample is subjected to ion beam mixing, the nature of the solid phase obtained depends upon the composition of the intermixed region. For samples initially consisting of a thin film on a substrate, the composition obtained by complete mixing is determined by the thickness of the film. An alternate sample configuration is prepared by depositing alternating thin films of different elements on a substrate (Fig. 2), and passage of the energetic ions homogenizes the films on an atomic scale. For this configuration the intermixed composition can be adjusted by varying the relative thickness of the different films. If the composition produced by ion beam mixing corresponds to a solid phase that is thermodynamically stable, this is the phase that will be obtained. Such a phase could be a solid solution of one constituent in the other, or it could be a chemical compound of the two constituents. In addition, ion beam mixing can also produce metastable phases, because it is a highly nonequilibrium process in which the kinetic energy of the bombarding ions is transferred to the target atoms and then very rapidly dissipated by atomic collision and rearrangement. For example, this technique has been used to prepare a continuous series of silver-copper solid solutions, in which the lattice constant varies almost linearly with alloy composition, and the solid solubility of nickel in silver has been increased to more than 40 times the equilibrium value. Splat cooling, the extremely rapid solidification of a liquid phase, is another nonequilibrium technique for producing metastable materials such as supersaturated solid solutions or amorphous phases. However, single-phase solid compositions can be achieved only to the extent that the constituents are miscible in the liquid phase. Thus in the silver-nickel system the effectiveness of splat cooling in increasing the solid solubility is limited
ion beam
substrate
alternating films Fig. 2. Ion beam mixing of alternating films.
Ion exchange by the immiscibility in the liquid phase. As a solidstate process, ion beam mixing is not subject to this limitation. Applications. While ion implantation is a wellestablished technique for modifying the near-surface physical and chemical properties of metals in order to improve such characteristics as wear and corrosion resistance, this technique has the disadvantage that relatively high ion doses may be necessary for effective surface modification. Thus doses of 1017− 1018 cm−2 are usually required to achieve impurity concentrations of 1–10 at. % in the near-surface region of about 100 nm in depth. To obtain such doses in a reasonable length of time requires high beam intensities that may be difficult to achieve for metallic elements, which are generally the ones used for direct implantation. This disadvantage can be overcome by the use of ion beam mixing, not only because the ratio of intermixed ions to bombarding ions is much greater than one, but also because the bombarding species can be ions of gases such as nitrogen and argon, for which high beam intensities are easily produced in long-life sources that are commercially available. For example, in initial demonstrations of surface modification by ion beam mixing, the wear resistance of titanium alloys has been substantially improved by the intermixing of a tin film bombarded with nitrogen ions, and the corrosion resistance of copper has been improved by the intermixing of a chromium film bombarded with argon ions. In silicon integrated circuits, metal silicides such as titanium silicide (TiSi2) and tungsten silicide (WSi2) are widely used for low-resistance contacts and interconnects. These materials have conventionally been formed by deposition of a metal film on the silicon wafer and subsequent thermal reaction between the metal and silicon. Ion beam mixing offers the advantage of reducing the temperature required for silicide formation, since penetration of the bombarding ions through the metal-silicon interface disperses interfacial impurities (particularly native oxides of silicon), which generally form a diffusion barrier that impedes reaction. Silicides formed by ion beam mixing exhibit good surface morphology and electrical properties. See INTEGRATED CIRCUITS. Bor-Yeu Tsaur Bibliography. R. E. Benenson et al., Ion Beam Modification of Materials, 1981; J. J. Cuomo, S. M. Rossnagel, and H. R. Kaufman (eds.), Handbook of Ion Beam Processing Technology, 1989; G. K. Hubler et al., Ion Implantation and Ion Beam Processing of Materials, 1984; F. Komarov, Ion Beam Modification of Metals, 1992; S. T. Picraux and W. J. Choyke, Metastable Materials Formation by Ion Implantation, 1982.
Ion exchange The reversible exchange of ions of the same charge between a solution and an insoluble solid in contact with it; or between two immiscible solvents, one of which contains a soluble material with im-
mobilized ionic groups. Ions are atoms or molecules containing charge-bearing groups. Their interactions are dominated by the electrostatic forces between charge centers. These interactions are attractive when the ions are of opposite charge, or repulsive when the ions have the same charge. Ions with a net negative charge are called anions, and those with a net positive charge are cations. A unique property of ions is their capacity to render gases and liquids conducting, and conductivity is a universal method of detecting ions. Ions in solution are in rapid motion and have no distinct partners. Ions in an electric field migrate to the electrode of opposite charge with a velocity roughly proportional to their charge-to-size ratio. This process is known as electrophoresis, and it is one method used to separate and identify ions. See ELECTROPHORESIS. Ions can also be separated on the basis of their equilibrium with a system containing immobilized ions of opposite charge. Ions can be immobilized by virtue of their location in a rigid matrix. Associated with these fixed ionic sites are mobile counterions of opposite charge. Solution ions with a higher affinity than the counterions for the fixed sites will displace them from the fixed sites and remain localized in the vicinity of the fixed sites. Simultaneously the solution is enriched in the counterions originally localized at the fixed sites. This exchange process for ions of the same charge type is called ion exchange. In a column containing the immobilized ions as part of the stationary phase and the solution of competing ions as the mobile phase, the sample ions can be separated by the repeated equilibrium steps involved as they are transported through the column until they exit it, and are detected. This is an example of ionexchange chromatography, an important method of separating and identifying ions. Materials Ion exchange was first recognized over a hundred years ago, when it was found that certain clay minerals in soil can remove potassium and ammonium ions from water, with the release of an equivalent amount of calcium ions. Since that time many naturally occurring ion-exchange substances have been identified, most of them inorganic substances. Also, attention shifted from naturally occurring inorganic ion exchangers to synthetic organic polymers for most practical uses of the ion-exchange process. These materials can be prepared in large quantities with physical and chemical properties custom-made for specific applications. Ion-exchange polymers. Phenol-formaldehyde condensation polymers have largely been replaced by materials based on styrene and divinylbenzene and, to a lesser extent, polymers prepared from divinylbenzene, or a similar cross-linking agent, and acrylic, methacrylic, or hydroxyalkyl methacrylic acids and esters. These are usually prepared in bead form by using the suspension process in which the monomers, catalysts, surfactant, and processing aids are suspended as oil droplets, which after reaction separate from the solvent as spherical beads with a
401
402
Ion exchange narrow size distribution. By varying the experimental conditions and concentration of reagents, beads with a wide range of physical properties can be prepared. Microporous beads consist of a core of entangled polymer chains and are essentially solid beads with narrow pores possessing a small surface area. Macroporous beads are prepared by bead polymerization in the presence of a solvent that is a good solvent for the monomers but a poor swelling agent for the polymer. In this case, each polymer bead is formed from many microbeads joined together during the polymerization process to create a network of holes and channels. The macroporous beads have greater mechanical stability, permeability, and surface area than the microporous beads. Ion exchangers prepared for the isolation or separation of cations must have negatively charged functional groups incorporated into the polymer backbone. The most common groups are sulfonic and carboxylic acids. Sulfonic acid groups are introduced by reacting the polymer beads with fuming sulfuric acid or a similar reagent. Similarly, carboxylic acid groups can be introduced by a number of common chemical reactions or by hydrolysis of the ester group or oxidation of hydroxyalkyl groups in methyl methacrylate or hydroxyalkyl methacrylate polymers, respectively. Other common functional groups used in cation exchangers include phosphoric acid and phenol and, to a lesser extent, phosphinic, arsonic, and selenonic acids. Specialty polymers containing functional groups such as 8-hydroxyquinoline or sulfoguanidine [ CH2S(:NH)NH2] are prepared for the isolation of metals by chelation. A common approach for the preparation of anion exchangers is to react the styrene-divinylbenzene polymer with chloromethylmethyl ether in the presence of a catalyst, which adds the side chain, CH2Cl; then this chloromethylated product is treated with an amine to introduce the charged functional group. A tertiary amine produces a quaternary ammonium group, while primary and secondary amines give products that are charged only in contact with solutions of low pH. As well as simple alkyl and benzyl amines, hydroxyalkyl amines are used to introduce functional groups of the type [ CH2N(CH3)2C2H4OH]+. See QUATERNARY AMMONIUM SALTS. Tentacle ion exchangers consist of an insoluble matrix, copolymerized from oligoethyleneglycol, glycidylmethacrylate, and pentaerythroldimethacrylate, to which are grafted polymerized chains of acrylamide derivatives that are approximately 15– 50 units in length. An average of 18 charged groups is covalently bound to each of these tentacles. This arrangement markedly reduces the contact between the analyte and the matrix, thus suppressing nonspecific support interactions, while also allowing ionic interactions between the ion exchanger and the analyte that are sterically impossible with conventional fixed-site ion exchangers. These materials are used for the isolation of biopolymers without loss of activity.
Ion exchangers with a low exchange capacity and fast mass-transfer properties are required for ion chromatography that uses conductivity detection. The time and temperature of the chemical reactions used to introduce the ionic functional groups into the polymer are controlled such that the reactions are limited to the external surface of the beads. Surface-functionalized anion-exchange polymer beads containing quaternary ammonium ions are difficult to prepare directly. An alternative approach is the preparation of agglomerated anion-exchange beads, prepared by taking surfacesulfonated beads and coating them with a layer of quaternary ammonium functionalized latex particles. The core particles are 5–30 micrometers in diameter, and the latex particles are much smaller at about 0.6–0.9 nanometer. The negatively charged sulfonate groups attract the positively charged quaternary ammonium ions and bind the latex particles very strongly to the surface of the polymer bead. Since only a small fraction of the quaternary ions on the latex particles are involved in binding of the anion-exchange layer, the remaining groups are free to function as an anion exchanger. This type of structure creates a thin anion-exchange pellicular layer within which the ion-exchange separation takes place, and an oppositely charged underlayer that excludes anions from the core by repulsion. Silica-based materials. These are used primarily in chromatography because of the favorable mechanical and physical properties of the silica (SiO2) gel support matrix. Ion-exchange groups are introduced by reacting the surface silanol groups of the porous silica particles with silanizing reagents containing the desired functional group (R). This produces a siloxane-type (Si O Si R) bonded phase, where the bonded layer can be monomolecular or polymeric depending on the choice of reagent and experimental conditions. Typical R groups include propylsulfonic acid [ (CH2)3SO3−], benzenesulfonic acid [ C6H4SO3−], butyric acid [ (CH2)3CO2−], propylbenzoic acid [ (CH2)3C6H4CO2−], aminopropane [ (CH2)3NH2], and dimethylaminopropane [ (CH2)3N(CH3)3+]. Alternatively, ion exchangers based on silica and other inorganic oxides can be prepared by polymer encapsulation. In this approach a prepolymer such as poly(butadiene-maleic acid) is deposited as a thin film over the porous support and is subsequently immobilized by radical-induced in-place cross-linking of the polymer chains. This creates an ion-exchange polymer as a thin skin covering all or a sizable part of the support surface. Hydrous oxides. Hydrous oxides of elements of groups 14, 15, and 16 of the periodic table can be used as selective ion exchangers. The most important hydrous oxides used for the separation of organic and inorganic ions are alumina (Al2O3·nH2O), silica (SiO2·nH2O), and zirconia (ZrO2·nH2O). Silica, by virtue of the presence of surface silanol groups, is used as a cation exchanger at pH > 2. Alumina is amphoteric and can be used as an anion
Ion exchange exchanger at low pH and a cation exchanger at high pH. Alumina has the advantage over silica of being chemically stable over a wide pH range. The ionexchange capacity of silica and alumina is controlled by the pH of the solution in contact with the oxides, since this controls the number of ionized surface functional groups. Alumina is used to isolate nitrogen-containing drugs and biochemically active substances from biological fluids, thus minimizing matrix interferences in their subsequent chromatographic analysis. Group 1 and 2 cations can be separated by chromatography on silica, with aqueous lithium salts as eluants. Zirconium phosphate is not a hydrous oxide, but it has many properties in common with the hydrous oxides. It has a high selectivity for the ions rubidium (Rb+), cesium (Cs+), strontium (Sr2+), and barium (Ba2+) and is used to remove long-lived 90Sr2+ and 137Cs+ from radioactive wastes. See PERIODIC TABLE; PH; RADIOACTIVE WASTE MANAGEMENT. Synthetic carbohydrate polymers. Ion-exchange polymers based on cross-linked dextrans, agarose, and cellulose are used primarily in biochemistry for the isolation and separation of polyelectrolytes such as proteins, enzymes, and nucleotides. The ion-exchange groups are introduced into the hydrophobic matrix by reacting a suitable reagent with the hydroxyl groups on the saccharide subunits. Typical functional groups include carboxymethyl, phosphoric acid, propylsulfonate, diethylaminoethyl, and quaternary amines. Characteristic Properties Ion exchangers are broadly categorized as belonging to one of two groups. Strong ion exchangers, such as those containing sulfonate or quaternary amine groups, remain ionized over the full range of normal operating conditions. The ionization of weak ion exchangers, such as those containing carboxyl or secondary amine groups, depends on the pH of the medium. Their degree of ionization governs their ion-exchange capacity. Capacity. The capacity of an ion exchange is the quantity of exchangeable ions per unit quantity of exchanger. It is usually expressed in gram-equivalents of ions per kilogram of exchanger, or more conveniently, milliequivalents per gram (meq/g) on a dry basis. General-purpose polymeric ion exchangers have capacities of about 3–5 meq/g. For ion chromatography a high capacity is undesirable because of the need to separate ions with eluants of low ionic strength to facilitate their detection by conductivity, and typical exchange capacities are in the range 0.001–0.1 meq/g. Volume. The amount of cross-linking reagent used in the preparation of bead polymers controls the extent to which the dry polymer swells when placed in solution, the speed with which ions can diffuse in and out of the ion exchanger, and to some extent the selectivity of the ion exchanger for ions of different charge-to-size ratios. Styrene-divinylbenzene polymers containing less than about 8 mole % of divinylbenzene are classified as soft. When suspended
in a solvent, the dry polymer absorbs the solvent, changing its volume. The observed change in volume depends on properties of the solvent as well as those of the polymer. When the polymer comes into contact with solvent, the outermost functional groups are solvated, and the randomly arranged polymer chains unfold to accommodate the larger solvated ions. A very concentrated internal solution of fixed ions and counterions exists, and the mobile counterions tend to diffuse out of the exchanger to be replaced by solvent in an attempt to reduce the difference in ionic strength between the internal and external solutions. This generates the swelling pressure, which may be as high as several hundred atmospheres for a high-capacity exchanger. The crosslinking of the polymer chains resists this swelling with different degrees of success that depend on structural and environmental factors. High-pressure liquid chromatography requires materials that are rigid and do not swell or change volume in different solvents. A high level of cross-linking is invariably required in this case. Size. Polymer beads are available in a wide range of sizes and in narrow size ranges. Large beads (2– 0.1 mm in diameter) are used in industrial processing, beads of intermediate size (0.2–0.04 mm) for sample processing and chromatography at limited pressures, and small beads (0.04–0.004 mm) for highperformance liquid chromatography at high pressures. Silica-based materials are available in a similar size range and are widely used in high-pressure liquid chromatography. The carbohydrate-based polymers are generally unstable to high pressures and are usually available in the large and intermediate size ranges only. The superior mass-transfer and favorable flow anisotropy properties of small-diameter particles are essential for modern liquid chromatography. Diffusion of ions through polymer gels is perhaps only one-tenth as fast as in liquids, and the only practical way to reduce this contribution to band broadening in chromatography is to minimize the particle diameter to reduce the average path length for migration of ions through the beads. In the surface-functionalized polymers used in ion chromatography, the migration distance for the exchangeable ions is restricted to the thickness of the active layer at the bead surface, which is only a few micrometers thick. From the perspective of performance characteristics, the silicabased materials are usually better, but the difference in properties between the polymer bead exchangers and the silica-based materials is no longer as significant as before. Stability and adsorption. The stability and adsorption properties of the matrix are important in selecting an ion exchanger. Silica-based materials, for example, are restricted to use within the pH range 2– 8 to avoid dissolution of the silica backbone at high pH and hydrolysis of the surface-bonded siloxane ligands at lower pH. Poly(styrene-divinylbenzene) polymers can be used at all pHs. For biopolymers the choice of matrix may be critical to the recovery of materials with preservation of biological activity. See LIGAND.
403
404
Ion exchange
Ion exchange has numerous applications for industry and for laboratory research. By the quantity of materials used, water conditioning is the most important. Ion exchange is one of the primary analytical methods used to identify and quantify the concentration of ions in a wide range of environmental, biological, and industrial samples. Water softening. Natural water from rivers and wells is never pure; it is usually hard, that is, it contains calcium and magnesium salts that form curds with soap and leave hard crusts in pipes and boilers. Hard water is softened by passage through a cartridge or bed of cation exchanger in the sodium form (the mobile counterions are sodium in this case). The calcium and magnesium ions have a higher affinity for the fixed ion-exchange sites than sodium ions and replace the sodium ions on the exchanger. The sodium ions do not cause hardness, so their increased concentration in the conditioned water is not a problem. Eventually the cation exchanger becomes exhausted, that is, fully loaded with magnesium and calcium ions, and must be regenerated or replaced. The cation exchanger is regenerated by passing a concentrated solution of sodium chloride through the bed of ion exchanger. See WATER SOFTENING. Deionized water. Many industrial and laboratory processes require a supply of pure water with a very low concentration of salts. This can be achieved by passing water through a bed of mixed strong cation exchanger in the hydrogen form and a strong anion exchanger in the hydroxide form. The cation exchanger removes all the cations from the water by replacing them by hydrogen ions. The anions are removed by the anion exchanger and replaced by hydroxide ions. The hydrogen and hydroxide ions combine to form water. The purity of the water can be monitored by its conductivity. Inclusion of a cartridge containing carbon to adsorb neutral organic compounds produces a reliable supply of very pure water containing only parts-per-billion levels of ions and organics. Mixed-bed ion exchangers tend to be more efficient, but they are difficult to regenerate. The series coupling of cation-exchange and anionexchange cartridges allows for ease of regeneration (Fig. 1). Environmental reclamation. Toxic ions such as mercury (Hg2+), lead (Pb2+), chromate (CrO42−), and ferrocyanide (Fe(CN))64− are removed by ion exchange from industrial wastewaters prior to their discharge into the environment. Ion exchangers are used to recover precious metals such as gold (Au+), platinum (Pt+), and silver (Ag+) in a useful form from mine workings and metalworking factories. Ion exchange is frequently used to decontaminate waste and concentrate radioactive elements from the nuclear industry. Uranium can be recovered from low-grade ores by leaching with dilute sulfuric acid, then absorbing the uranium sulfate complex ions on a strong anion exchanger, which has a high affinity for uranium sulfate. See HYDROMETALLURGY; SOLUTION MINING; URANIUM METALLURGY.
(NaOH)
(H2SO4)
Applications +
Na +(Ca2 , etc.) − Cl − (SO24 , etc.)
H+ Cl −
(Na+ Res −)
H+ Res
waste
(Res+ Cl − )
+ − Res OH
−
waste
pure H2O
Fig. 1. Two-stage deionization by ion exchange. Valves are for regeneration with sulfuric acid (H2SO4) and sodium hydroxide (NaOH).
Chemical analysis. Ion exchange is used on the laboratory scale for isolation and preconcentration of ions prior to instrumental analysis and to obtain preparative scale quantities of material for use in laboratory studies. For example, organic acids can be isolated from urine with a strong cation exchanger free of interference from the bulk of nonionic water-soluble organic compounds normally present in urine. Trace quantities of ions in ice from the polar regions can be concentrated by ion exchange for subsequent separation and identification by ion chromatography. Ion exchange is often employed in conjunction with activation analysis to isolate individual elements for quantification by radiochemical detection. Isolation schemes for individual elements in complex mixtures by ion exchange employing multiple changes in buffers and complexing agents are well known. These methods are slow and tedious but reliable, and they serve as reference methods to check the validity of fast routine methods of analyzing complex materials. See ACTIVATION ANALYSIS; ANALYTICAL CHEMISTRY. Biotechnology. Biotechnology requires reliable, efficient methods to purify commercial-scale quantities of proteins, peptides, and nucleic acids for use in the pharmaceutical, agricultural, and food industries. Ion exchange is widely used in the isolation and purification of these materials. Typical applications include the removal of ionic compounds used in the production process, the elimination of endotoxins and viruses, the removal of host-cell proteins and deoxyribonucleic acid (DNA), and the removal of potentially hazardous variants of the main product. Since the biological activity of proteins is closely related to their three-dimensional structure and this structure is labile, one of the main reasons for choosing ion-exchange methods for the isolation and purification of proteins is that weakly denaturing conditions can be employed throughout the process steps. For commercial exploitation of biotechnology
Ion exchange concentrations, ppm AsO2– F– Cl– NO2–
Cl–
Br– NO3– SeO32– HPO42– SO42– SeO42–
NO2–
Br– F–
4 0.2 0.6 1
NO3– SO42–
HAsO42–
concentrations, ppm (Li–Cs)
2 2 4
Mg2+
3 3 4 4
3 3 10 25
Ca2+
2–
HPO4 SeO32–
Ca2+ Mg2+ Sr2+ Ba2+
Sr2+ SeO42– Ba2+ HAsO42–
0
2
4
6
8 10 12 14 16 18 20 22 min
(a)
inject
products, the retention of biological activity is as important as recovery; and if the product is destined for human or animal consumption, biological impurities with adverse reactions must be reduced to a specified control range. Chromatography. Modern chromatographic techniques employ ion exchangers of small particle size and favorable mass-transfer characteristics, and operate at high pressures, providing better resolution of mixtures in a shorter time than with conventional gravity-flow-controlled separations. In clinical laboratories, ion exchange has long been employed as the basis for the routine, automated separation of amino acids and other physiologically important amines used to identify metabolic disorders and to sequence the structure of biopolymers. Under typical conditions the amino acids are separated in the protonated form on a strong cation exchanger with stepwise changes in pH by using various combinations of citrate and borate buffers. The amino acids are usually detected as their fluorescent products after a postcolumn reaction with ninhydrin or o-phthalaldehyde. See AMINO ACIDS. Probably the oldest and most common method of separating carbohydrate mixtures is by ligandexchange chromatography using polymeric cation exchangers loaded with metal ions, for example calcium, for separating sugar and silver to determine the oligomer distribution of oligosaccharides, with aqueous organic solvents as the mobile phase. Retention in this case is governed by a combination of size exclusion and electrostatic attraction between the electronegative sugar oxygen atoms and the electropositive metal cations. Carbohydrates are also separated by anion-exchange chromatography after they are converted to negatively charged borate complexes or as their negative ion form at high pH in aqueous solution. Ion chromatography is an ion-exchange-based separation system that changed the practice of ion analysis by replacing many tedious wet-chemical procedures with a simple automated instrument that can determine several ions simultaneously in a single method (Fig. 2). Its evolution into a major instrumental method of analysis was a synergistic development of efficient low-capacity ion-exchange column packings that allowed the use of low-concentration eluants containing competing ions with a high affinity for the ion exchanger, together with the provision of continuous on-line detection based on conductivity monitoring of the column effluent. Ion chromatography has been used extensively for the determination of ions in drinking water, wastewater, plating baths, detergents, foods, and many other applications. It is almost always the method of choice for determining anions and organic cations and has become popular for the separation of metal ions. See CHROMATOGRAPHY; LIQUID CHROMATOGRAPHY. Ion-pair chromatography. Ion-pair chromatography provides an alternative method of separating ions without the use of an ion-exchange column. A temporary ion-exchange layer is formed by the adsorption of hydrophobic ions onto the surface of a reversed-
405
0
4
8 min
12
16
(b)
Fig. 2. Chromatograms showing separation of (a) common anions and (b) alkaline-earth cations by ion chromatography with conductivity detection. (After C. F. Poole and S. K. Poole, Chromatography Today, Elsevier, 1991)
phase column packing material. The hydrophobic ions are added to the mobile phase in a fixed concentration in equilibrium with the ions adsorbed onto the stationary phase, and are present throughout the separation. Suitable hydrophobic ions for separating anions are tetraalkylammonium salts, and for the separation of cations, alkanesulfonate salts. The most important variables in optimizing the separation of ions by ion-pair chromatography are the concentration of the ion-pair reagent, the concentration of organic solvent in the mobile phase, and the mobile-phase pH. An advantage of ion-pair chromatography is that ions and neutral molecules can be separated at the same time by using the efficient column packings developed for reversed-phase chromatography. Colin F. Poole Membranes Ion-exchange membranes are a class of membranes that bear ionic groups and therefore have the ability to selectively permit the transport of ions through themselves. In biological systems, cell membranes and many other biological membranes contain ionic groups, and the conduction of ions is essential to their function. Synthetic ion-exchange membranes are an important field of technology. They are used in fuel cells, electrochemical processes for chlorine manufacture and desalination, membrane electrodes, and separation processes. Ion-exchange membranes typically consist of a thin-film phase, usually polymeric, to which have
406
Ion exchange
feed solution
cathode
+ + + + + + + + + + +
+ – – +
+ – – + – +
anion - exchange membrane brine
+ –
– – – – – – – – – – – – – – – – – – – –
+ + + + + + + + + + +
diluate
+ – – +
+ – – + – +
brine
– – – – – – – – – – – – – – – – – – – –
anode
cation - exchange membrane
Fig. 3. Electrodialysis process, in which the ions of the saline water are separated under the influence of an electric field. (After H. Strathmann, The principle of electrodialysis, in P. M. Bungay et al., eds., Synthetic Membranes: Science, Engineering and Applications, D. Reidel Publishing Co., 1986)
been attached ionizable groups. Numerous polymers have been used, including polystyrene, polyethylene, polysulfone, and fluorinated polymers. Ionic groups attached to the polymer include sulfonate ( SO3−), carboxylate ( COO−), tetralkylammonium ( N(CH3)4+), phosphonate ( PO3H−), and many others. If the polymer is sufficiently hydrophobic that it will not dissolve in water, it may be formed into a film without further processing, but more hydrophilic polymers may require cross-linking to avoid excessive water solubility. As a result of their ionic charge, ion-exchange membranes always contain counterions (gegen-ions) that have the opposing charge, and they maintain electrical neutrality. On contact with water, the membranes swell as the ions are hydrated, and they fre-
anode
cathode MOH
HX
H2O
OH –
M+
cationic membrane MX
H+
bipolar membrane
X–
anionic membrane MX
Fig. 4. Bipolar membrane salt-splitting process; water within the membrane splits into hydrogen (H+) and hydroxide (OH−) ions, which combine with the anions and cations to form acid and alkaline products.
quently have sufficient water in their structures to form a second continuous phase. This water is beneficial, since it allows ionic mobility and, by plasticizing the membrane polymers, some polymer mobility also. The water-swollen membrane has more space between adjacent polymer chains (greater free volume), and this allows the polymer chains to adjust their positions to allow ions to pass. The Donnan effect is important to the operation of ion-exchange membranes. The Donnan equilibrium may be written for the simple case of univalent ions [Eq. (1)], where the o subscripts indicate [M+ ]o [X− ]o = [M+ ]m [X− ]m = [MX]
(1)
the solution outside the membrane; and the m subscripts indicate the interior of the membrane; the bracketed terms indicate the equilibrium concentrations. Since the membrane contains a large number of fixed charges, the concentration of counter-ions is also high, and the concentration of co-ions is therefore low. Thus an ion-exchange membrane containing negative fixed charges will readily permit transport of positive counterions, but it will act as a barrier to negative co-ions. The driving force for this transport may be a concentration gradient or an applied electric field. See DONNAN EQUILIBRIUM. Membrane separators. Ion-exchange membranes are frequently used as membrane separators in electrochemical processes. Important among these are fuel cells, batteries, and chloralkali cells. In these applications, the membrane must be a selective conductor for ions, allowing the circuit to close. It thus separates the two half cells, while allowing the passage of ions necessary for the overall process. See FUEL CELL. An example is the process for manufacturing chlorine. A sodium chloride brine is fed to the anode side of the cell, and an electric current is passed through it. Chloride ions are converted to chlorine gas. Sodium ions pass through the membrane to the cathode side, where water is electrolyzed to hydrogen gas, leaving sodium hydroxide. Since the membrane does not pass chloride ions, the resulting sodium hydroxide is quite pure. See ELECTROCHEMICAL PROCESS. Electrodialysis. Electrodialysis is the process whereby saline or brackish water is rendered potable by separating the ions under the influence of an electric field. It is normally carried out by using a membrane stack (Fig. 3), in which cation-exchange membranes and anion-exchange membranes are alternated, and a cathode and anode apply the electric field on opposing sides. The saline solution is fed to all of the intermembrane gaps. Under the influence of the electric field, cations pass through the cation-exchange membranes, and anions through the anion-exchange membranes in opposite directions. The result is that alternate cells are depleted in these ions, while the remaining cells experience a net concentration. The depleted streams form the desalinated product. See DIALYSIS; WATER DESALINATION.
Ion implantation Salt splitting with bipolar membranes. Salt splitting is related to electrodialysis in technology, but has the goal of converting a salt (MX) into an acid-base pair [HX + MOH; Eq. (2)] for the univalent case. The key MX + H2 O → HX + MOH
(2)
added element is a bipolar membrane that has positive charges on one side and negative charges on the other. This membrane is placed between anionand cation-exchange membranes (Fig. 4). In operation, water (H2O) is split, forming hydrogen (H+) and hydroxide (OH−) ions within the bipolar membrane. The hydrogen ions combine with anions passing the anion-exchange membrane to form the acid product, while the products combine with cations passing the cation-exchange membrane to form the alkaline product. Membrane electrodes. Membrane electrodes are widely used in analytical chemistry, the best-known example being the pH electrode. The key to their success is an ion-exchange membrane that is highly selective for the ion of interest. Membranes may be composed of glass or crystalline materials, or they may be made of polymeric membranes similar to those used in other membrane applications. Liquid membranes are also in common use in this application. The membrane is placed between the solution to be analyzed and a second solution containing the solution of interest. The difference in activity between these two solutions creates an electrical potential that can be measured. High flux is not necessary for this application, but rapid response is essential. See ELECTRODE; ION-SELECTIVE MEMBRANES AND ELECTRODES. Liquid membranes. Liquid membranes differ in configuration from the previous applications. Instead of a solid or polymeric membrane, a liquid phase that is immiscible with the external solution is used. Liquid membranes may be supported, in which case the immiscible liquid wets a porous solid that holds it in place; or they may be unsupported, in which case the membrane consists of emulsion globules dispersed in an external phase. Regardless of configuration, differences in concentration between the two nonmembrane phases create a driving force for transport of the ions across the membrane. In the simplest case, these concentration differences are for the ion of interest. However, it is also possible to drive the transport process by using a concentration difference in another ion linked to the first by equilibrium processes. For example, in hydrometallurgical processes a high concentration of acid on the product side (Hp) of a liquid membrane system has been used to drive the concentration of a metal to this side of the membrane. N. N. Li; S. F. Yates Bibliography. P. M. Bungay, H. K. Lonsdale, and M. N. de Pinho (eds.), Synthetic Membranes: Science, Engineering and Applications, 1983; P. R. Haddad and P. E. Jackson, Ion Chromatography: Principles and Applications, 1990; W. S. Hancock, High Performance Liquid Chromatography in Biotechnology, 1990; N. N. Li and J. M. Calo,
Separation and Purification Technology, 1992; P. Meares, Membrane Separation Processes, 1976; C. F. Poole and S. K. Poole, Chromatography Today, 1991; R. Rautenbach and R. Albrecht, Membrane Processes, 1989; O. Samuelson, Ion Exchange Separations in Analytical Chemistry, 1963; H. Small, Ion Chromatography, 1989; H. F. Walton and R. D. Rocklin, Ion Exchange in Analytical Chemistry, 1990.
Ion implantation A process that utilizes accelerated ions to penetrate a solid surface. The implanted ions can be used to modify the surface composition, structure, or property of the solid material. This surface modification depends on the ion species, energy, and flux. The penetration depth can be controlled by adjusting the ion energy and the type of ions used. The total number of ions incorporated into the solid is determined by the ion flux and the duration of implantation. This technique allows for the precise placement of ions in a solid at low temperatures. It is used for many applications such as modifying the electrical properties of semiconductors and improving the mechanical or chemical properties of alloys, metals, and dielectrics. See ALLOY; DIELECTRIC MATERIALS; METAL; SEMICONDUCTOR. Wide ranges of ion energy and dose are applied. For ion energy ranging from 1 keV to 10 MeV, the ion penetration depth varies from 10 nanometers to 50 micrometers. In general, it is difficult to get deeper penetration since extremely high energy ions are required. As such, ion implantation is a surface modification technique and not suitable for changing the entire bulk property of a solid. Ion dosage also varies depending on the applications. Doses ranging from 1010 to 1018 ions/cm2 are typically applied. For high-dose applications, ion sources providing high ion currents are needed to keep the implantation time reasonable for production purposes. See ION. System. An ion implantation system consists of an ion source, an analyzer, an accelerator, a scanner, and a dose integrator at the target. In the ion source, a gaseous mixture is excited to form a plasma containing ions, electrons, and neutral particles. The ions enter the mass analyzer, where they are separated by an adjustable magnetic field for ion selection. The desired ions are chosen based on the chargeto-mass ratio of the ions, and the analyzer is sensitive enough to discriminate against adjacent mass numbers. The ions are then accelerated to the desired ion energy. The ion beam is scanned over the entire solid surface either electrostatically or mechanically. The total ion dose is determined by the charge integrator near the target. When ions are implanted into insulators, an electron source near the solid can be used to supply the surface with electrons to prevent charge buildup. See ELECTRON; PARTICLE ACCELERATOR. Alternative ion sources include focused ion beam
407
Ion implantation systems and immersed plasma ion systems. In focused ion beam sources, ion lenses are used to form a convergent beam of ions with a small spot. Such sources have the flexibility of depositing ions in the desired areas without the use of a masking layer. Since the spot size can be focused down to 10 nm, selective ion implantation is possible with very high resolution. The drawbacks for focused ion beam sources are long processing times (since it is a “direct-write” serial process) and the limited types of available ions. In an immersed plasma ion source, the solid to be implanted is placed inside the plasma where ions are generated. The electric field between the source and the target accelerates the ions to the desired energy. This technique has the advantages of achieving higher ion density, covering a larger area, and allowing solids with different shapes to be implanted. See ELECTRIC FIELD; ION SOURCES; PLASMA (PHYSICS). Projected ion range. As ions enter into a solid, they undergo a number of collisions with the atoms in the solid until they finally come to rest at some depth below the surface, which is the projected ion range. The projected range is proportional to the ion energy and inversely proportional to the ion mass. The projected ranges (see illus.) for different ion species implanted into silicon increase proportionally to the energy (10 keV–1 MeV). For boron implanted in silicon, which is often used as a p-type dopant for transistors, a high ion energy of 400 keV is required to obtain a 1-µm-deep projected range. Much higher ion energy is needed to achieve projected ranges beyond 1 µm, and it is difficult and expensive to accomplish. See BORON; SILICON. The interactions between the implanted ions and the solid include elastic collisions between pairs of nuclei and inelastic collisions with electrons in the solid. High-energy ions initially lose their energy mostly by electron collisions and travel in a nearly straight path. As energy is reduced, nuclear collisions start to dominate at low energy and a zigzag path begins. Ions stop in the solid when ion energy is decreased to zero after a large number of collisions and changes of ion path. The projected range is the average depth of the implanted ions. The ion distribu-
5 projected range, µm
408
AI B Si P
1
As Sb 0.1
0.01 10
100 implant energy, keV
1000
Projected range as a function of implant energy for ions (aluminum, boron, silicon, phosphorus, arsenic, and antimony) in silicon. (After J. F. Gibbons, ed., Projected Range Statistics, 2d ed., 1975)
tion can be approximated as a gaussian distribution with a standard deviation. The concentration of the implanted ions is lower near the surface, peaks at the projected range, and decreases beyond the projected range. The ion dispersion occurs in the vertical as well as the horizontal directions. Typically, ion dispersion decreases as the ion energy or the mass of the implanted ion increases. See ION-SOLID INTERACTIONS. In crystalline materials, such as single-crystal semiconductors, the atoms are arranged in regular patterns. Ions moving in certain directions can encounter rows or planes of atoms that form long-range open spaces through which the ions can travel without significant collision. Ions can travel down these channels to a much deeper distance, which is called ion channeling. For ions inside a channel, the major energy loss is due to electronic scattering since they have very few collisions with the nuclei. Since the stopping power for electronic scattering is low, the ion channeling path can be much deeper, showing up as a long tail to the implanted distribution. To avoid ion channeling and keep the ion implantation shallow, the angle between the incoming ions and the solid can be tilted, or the ions can be implanted through an amorphous layer. Ion implantation into a silicon wafer is typically carried out by tilting the solid 7◦ to minimize ion channeling. See CHANNELING IN SOLIDS. Ion damage and annealing. As ions are implanted into a solid, they undergo a series of nuclear collisions. Every time an ion is scattered, part of the energy is transferred to the atoms in the solid, possibly displacing atoms in its path. In addition, the implanted ions generally do not rest in the proper sites, and implant-induced damage often occurs (due to the impact by high-energy ions). This damage effect is especially critical for crystalline solids. Thermal annealing is used both to repair the crystalline defects produced by the implanted ions and to allow the ions to move to their proper sites (so that the desired surface modifications can be realized). Annealing at temperature above 600◦C (1112◦F) for a few minutes is usually needed to remove the ion damage. In some cases, it may be helpful to anneal the damaged solid to cause surface melting, providing regrowth by liquid phase. During thermal annealing, diffusion (or distribution) of the implanted ions can occur. To avoid ion diffusion, rapid thermal annealing is used to remove damage at higher temperatures, but only by using a very short annealing time of a few seconds. Applications. Ion implantation is used extensively in the semiconductor industry. The fabrication of integrated circuits in silicon often requires many steps of ion implantation with different ion species and energies. The implanted ions serve as dopants in semiconductors, changing their conductivity by more than a factor of 108. To introduce more electrons in silicon, n-type dopants such as arsenic, phosphorus, and antimony are often used, whereas boron is usually used as a p-type dopant to provide more holes in silicon. By implanting high doses of
Ion propulsion oxygen ions in silicon, an insulating oxide layer can be formed underneath the surface, providing a silicon-on-insulator (SOI) structure. These SOI wafers can be used to produce integrated circuits with high speed and power. See ANTIMONY; INTEGRATED CIRCUITS; PHOSPHORUS. Ion implantation is also used to change the surface properties of metals and alloys. It has been applied successfully to improve wear resistance, fatigue life, corrosion protection, and chemical resistance of different materials. Even though the ion projected range is less than 1 µm, surface treatment by ion implantation can extend the lives of metal or ceramic tools by 80 times or more. Ion implantation can form new compounds such as nitrides on the surface, and the implanted ions can be found at much greater depths than the projected range due to diffusion or mechanical mixing. See CERAMICS. Stella W. Pang Bibliography. J. F. Gibbons (ed.), Projected Range Statistics, 2d ed., 1975; F. Komarov, Ion Beam Modification of Metals, 1992; J. W. Mayer (ed.), Ion Implantation in Semiconductors, Silicon, and Germanium, 1970; E. Rimini (ed.), Ion Implantation: Basics to Device Fabrication, 1994; J. F. Ziegler (ed.), Handbook of Ion Implantation Technology, 1992; J. F. Ziegler (ed.), Ion Implantation: Science and Technology, 1988.
Ion propulsion Vehicular propulsion caused by the high-speed discharge of a beam of electrically charged minute particles. These particles, usually positive ions, are generated and accelerated in an electrostatic field produced within an ion thruster attached to a spacecraft. Because positive ions cannot be ejected from the thruster without leaving a substantial negative charge on the thruster and spacecraft, electrons must be ejected at the same rate. Ion propulsion systems are attractive because they expel the ions at very high speeds and, therefore, require much less
propellant than other thrusters, such as chemical rockets. The three principal components of an ion propulsion system (Fig. 1) are the power-generation and -conditioning subsystem, the propellant storage and feed subsystem, and one or more ion thrusters. Power generation. This portion of the system can be broken down into power source and powerconversion and -conditioning equipment. Power source. The power source can be a nuclear reactor or a radiant-energy collector. In the former, thermal power is released by fission or fusion reactions. Mass-minimization considerations generally define spacecraft subsystem configurations. The dominant masses in the case of the nuclear power source are generally those for the main radiator required to reject waste heat from the thermodynamic cycle, and shielding required to protect equipment and personnel against ionizing radiation from the reactor. Solar radiation as the power source does not require shielding. It can be used to provide electric power directly through photovoltaic (solar) cells or indirectly through a solar collector–heat exchanger system similar to that for a nuclear system (Fig. 1). A single thermodynamic-cycle working fluid suffices, but power levels are generally considered limited to 100 kW by collector-size limitations. Radiant energy beamed to a spacecraft at nonoptical frequencies (for example, as microwaves) and converted to electrical power in appropriate photovoltaic cells may also be attractive in some applications. Solar cells are a simple and useful power source for propelling small probes into geolunar space, the inner solar system, and in some instances as far as Jupiter. Their principal disadvantage is low voltage. Power-conditioning equipment, to achieve the high voltage required for ion propulsion, adds to the system mass, although considerable advances continue to be made in reducing the complexity and specific mass (mass per unit power) of solar cells and
power generation subsystem
nuclear reactor
generator and power conditioner
radiation radiation heat shield exchanger shield turbine
propellant
ionizer first fluid loop
accelerator
pump
second fluid loop main radiator
power source
radiator for generator/conditioner
power conversion subsystem
Fig. 1. Nuclear-powered electrostatic (ion) propulsion system.
propellant subsystem
neutralizer
ion thruster
409
410
Ion propulsion power-conditioning equipment. Solar cells are damaged by exposure to particle radiation environments, which limits their useful lifetimes to values that may be tolerable in the Earth’s radiation belt but are probably too low in the harsher environments at Jupiter or near the Sun. As solar-powered generators approach the Sun, the energy flux density incident upon their collectors increases, their specific mass decreases, and they become increasingly attractive. Unfortunately, the range of distances from the Sun over which they are truly attractive is not large, because of particle radiation close to the Sun. Additional disadvantages of solar energy systems include the requirement to maintain a collector–concentrator alignment relative to the Sun, the loss of power in the shadow of a celestial body, and the difficulty of collector maintenance in an environment containing micrometeoroids. See SOLAR CELL; SOLAR ENERGY. Power conversion. If the power-generation system involves a nuclear reactor or a solar-thermal subsystem, thermal-to-electric conversion subsystems are required. Those most highly developed involve thermodynamic conversion cycles based on turbine generators. Although most traditional systems have operated on the Brayton gas cycle or the Rankine vapor cycle, more recent efforts include the Stirling gascycle system. In the Rankine cycle, a liquid is heated and evaporated either in a heat exchanger (two-loop system; Fig. 1) or directly in a nuclear reactor or solar-heated boiler (single-loop system). The vapor is then expanded through the turbine, condensed in the main radiator, and pumped back into the heat exchanger, reactor, or boiler. Electrical power is drawn from the generator which is driven by the turbine. Where there is concern about radioactivation of the working fluid (low-melting-temperature metals typically), the two-loop system is used and the heat exchanger serves to limit the spread of radioactivity. Heat transfer is complicated by the weightless conditions of space. Other thermodynamic cycles such as the Brayton or Stirling cycles, which do not require vaporization and condensation, mitigate this problem, but they typically have lower efficiencies. See BRAYTON CYCLE; GAS TURBINE; GENERATOR; RANKINE CYCLE; STIRLING ENGINE. The main radiator, which rejects the waste heat from the conversion subsystem to space, is the largest and heaviest of the power-generationsubsystem components. Its size and mass are directly proportional to the thermal power rejection rate. High efficiency is critical to low radiator mass, but operating temperature, physical configuration, and micrometeor protection are also important. Alternative methods of power conversion employ thermoelectric (semiconducting thermocouples) and thermionic devices to effect direct conversion of heat to electricity. These devices have a history of lower efficiencies than fluid cycle systems operating between the same temperatures, but they are typically more compact and, because they typically have no moving parts, they are
potentially more reliable. See THERMIONIC POWER GENERATOR; THERMOELECTRICITY. Propellants. Ion-thruster propellants are selected according to performance, technical suitability, availability, cost, and storage characteristics. For geospace transportation, the effect of ions that may spiral along the Earth’s magnetic field lines to the magnetic poles must also be considered. High thruster efficiency requires a high-molecularmass propellant which facilitates operation at high utilization efficiency where the loss of unaccelerated propellant is small. A high molecular mass also facilitates operation at a high acceleration voltage, which assures a high thruster electrical efficiency. Low first-ionization potential and high first-excitation potential for the propellant are important. Such potentials assure minimal radiation losses and low ionproduction power. High thrust-to-power ratio is also promoted by a high propellant molecular mass. Operation at this condition is desirable because spacecraft are usually power-limited. Thruster lifetime can be extended by a high second ionization potential and a low charge-exchange cross section to limit the rate at which doubly charged and charge-exchange ions erode thruster surfaces. Compatibility of thruster with spacecraft materials and functions is essential. Propellants that are, for example, likely to interfere with the operation of the thruster or spacecraft systems by forming coatings are undesirable. Operational controllability is important. Monatomic propellants are preferred because they ionize in a predictable way and the thrust created by their expulsion is, therefore, more readily controlled. Propellants that have been investigated include argon, xenon, cesium, mercury, and fullerenes such as C60 (see table). Although mercury received most of the early attention, xenon is now being used on all space missions because of toxicity concerns with mercury. Cesium is not used because it tends to migrate and coat surfaces on components such as insulators. C60, which is attractive because of its high molecular mass, is probably unacceptable because it tends to fragment and to form negative ions, which degrade thruster performance. Thrust device. Ion or electrostatic thrust devices contain three functional elements: an ionizer that generates the ions; an accelerator providing an electric field for accelerating the ions and forming them into a well-focused beam; and a neutralizer or electron emitter that neutralizes the electrical charge of the exhaust beam of ions after they have been ejected. Electrostatic thrusters are classified according to the type of ions they eject and by the method utilized to produce them. Ionizer. The positive ions needed for acceleration are produced in a strong electric field, by contact with a surface having a work function greater than the ionization potential of the propellant, or by electron-bombardment ionization. The last method
Ion propulsion Ion propellant characteristics
Propellant
Atomic weight
Availability
Storability
Spacecraft thruster material compatibility
Argon Xenon Cesium Mercury Fullerene
39.95 131.3 132.9 200.6 720
Good Scarce Limited Good Good
Cryogenic liquid High-pressure gas Solid Liquid Solid
Excellent Excellent Poor Good Marginal
has received the most attention and appears to be the most promising. In an electron-bombardment ionizer, high-energy electrons collide with relatively massive neutral atoms or heavy molecules like fullerenes being supplied in the discharge chamber and induce ionization (Fig. 2). In the direct-current ionizer, the electrons are accelerated from a cathode to kinetic energies of a few tens of electronvolts via a potential difference sustained by an anode. These electrons are usually obtained from a hollow cathode which has the long lifetime, high reliability, and low power consumption needed for space operation. Various magnetic-field configurations, which are usually induced by permanent magnets, are employed to confine the electrons away from the anode until after they have expended most of their kinetic energy in collisions with neutral atoms or molecules. The magnetic field also serves to confine and direct the ions produced toward the accelerating electrodes. This class of thruster has been operated in space with cesium, mercury, and xenon propellants. In another type of electron-bombardment ionizer, electrons that result from the ionization process itself are accelerated in high-frequency electric fields induced within the discharge chamber by external antennae. These electrons acquire sufficient kinetic energy to ionize propellant atoms; they then migrate toward an electrode where they are collected. In contrast to dc ionizers, these use high-frequency
magnetic field line
permanent magnet
propellant gas e−
power source
ion jet
e−
propellant gas hollow cathode anode
Key: electron
ion
accelerating electrode pair
atom ion
e−
500−10,000 V
Fig. 2. Direct-current electron-bombardment ion thruster.
Ionization potentials, eV 1st
2d
Thruster technology status
15.76 12.13 3.90 10.44 7.6
27.5 21.2 23.4 18.8 ?
Ground tests Space flight Space flight Space flight Preliminary tests
rather than direct-current power to accelerate electrons and require no cathode. See ION SOURCES. Accelerator. Some of the ions produced are directed toward the ion-accelerating subsystem which typically consists of two plates containing large numbers of aligned hole pairs (Fig. 2). The upstream plate and the body of the ionizer are maintained at a potential V (in volts) positive with respect to the space downstream from the thruster, whereas the downstream plate is biased negative at a smaller value. The positive ions are therefore first accelerated between the plates and then decelerated after they pass through the downstream plate until they have been accelerated through the net (or beam) voltage V. In order to ensure that ions do not strike the downstream plate directly, hole size and alignment and plate spacing must be set and maintained properly. For a high extracted ion current density, the plates should be as close together as possible. Charge neutralization. Electrons are released near the ion beam downstream from the accelerator at the same rate as the ions. These electrons neutralize the space charge of the positive ions and prevent gross spacecraft charging. They are prevented from migrating upstream into the positively biased ionizer by the negatively biased downstream accelerator plate. A hollow cathode, similar to the one used as an electron source in the dc ionizer but optimized for low propellant flow rate, is used as a neutralizer because of its high efficiency and long life. Stationary plasma thruster. This thruster combines the functions of ionization, acceleration, and neutralization in a simple open chamber without a planar electrode pair. It utilizes a magnetic field to confine electrons while ions are accelerated electrostatically, and current designs have a limited specific impulse capability (1000–2000 s). See SPECIFIC IMPULSE. Applications. Ion propulsion is characterized by high specific impulse and low thrust. Thrusters have generally been designed for operation in the 2500– 10,000-s specific impulse range, but the upper limit could be extended. Because high specific impulse means low propellant consumption, ion propulsion is attractive for a wide variety of applications. Satellites. One functional category includes the use of ion thrusters on satellites for orbit control (against weak perturbation forces) and for station keeping (position maintaining of satellite in a given orbit). Substantial commercial use of ion thrusters in this application began at the end of the twentieth century.
411
412
Ion-selective membranes and electrodes An ion propulsion system can also be used advantageously for changing the satellite’s position in a given orbit, especially shifting a satellite to different longitudes over the Earth in an equatorial geostationary orbit. In current systems, the satellite’s reaction control system is fired for 2–3 min, causing the satellite to drift slowly to a different place, where it is stopped by a second reaction control system maneuver. The slow drift can be time-consuming, several times the few days required by an ion thruster for a 180◦ shift. See SATELLITE (SPACECRAFT); SPACECRAFT PROPULSION. Orbit raising and related missions. Ion propulsion in geospace can be advantageous, but it is complicated by three factors: strong gravity forces, which typically result in near-Earth to geosynchronous transfer times of the order of 100 days depending on available power-vehicle mass ratio; the Van Allen radiation belt; and eclipsing. The last two factors are problematical when the Sun’s power is used, especially when silicon photovoltaic cells, which suffer radiation damage in the Van Allen belt, are involved. The exclusive use of nuclear power or the use of a reusable high-thrust interorbital stage to boost the solar-electric stage from near-Earth to an orbit above the most damaging region of the Van Allen belt (18,000 km or 11,000 mi) can eliminate the last two concerns. Orbit plane changes accomplished along with orbit raising are also attractive, although they increase transfer times. Geospace missions might involve delivery of cargo, single- and multiple-satellite servicing, and space debris retrieval. This last mission is fast becoming important as the probability of collisions with debris, especially at the geosynchronous altitude, increases. See VAN ALLEN RADIATION. Geolunar transfer. Earth-Moon transfer can be accomplished with ion propulsion starting at any place in the near-Earth to geosynchronous altitude range, but the complicating factors cited above make the higher-altitude starting point more attractive. Interplanetary transfer. A major functional application of ion propulsion is interplanetary transfer. Here, thrust has to overcome only very weak solar gravitational forces. Because of this, and the long powered flight times of which ion propulsion is capable, transfer times to Venus or Mars need not be longer than transfer times in comparable flights with high thrust drives capable only of short powered flight. At the very large distances to objects in the outer solar system, ion propulsion would yield shorter transfer times than chemical and most high-thrust nuclear concepts. The National Aeronautics and Space Administration (NASA) Deep Space 1 mission, started at the end of the twentieth century, uses a 30-cm-diameter (12-in.) xenon ion thruster to propel a spacecraft to encounters with the asteroid Braille (in 1999) and the comet Borrelly (2001). See ELECTROTHERMAL PROPULSION; INTERPLANETARY PROPULSION; PLASMA PROPULSION; SPACE PROBE. Paul J. Wilbur Bibliography. G. R. Brewer, Ion Propulsion, 1970; M. S. El-Genk et al., System design optimization for multimegawatt space nuclear power applications,
J. Propuls., Power, 6:194–202, 1990; H. R. Kaufman, Technology of electron-bombardment ion thrusters, Adv. Electr. Electr. Phys., 36:273–275, 1974; E. J. Lerner, Plasma thrusters from Russia, Aerosp. Amer., 30(9):51, September 1992; H. Mark and H. J. Smith, Fast track to Mars, Aerosp. Amer., 29:36–41, 1991; A. K. Sinha, B. Agrawal, and W. W. Wu, Trends in satellite communications technology, techniques and applications, Int. J. Satel. Commun., 8:283–294, 1990; P. J. Wilbur, J. R. Beattie, and J. Hyman, Approach to the parametric design of ion thrusters, J. Propuls., Power, 5:575–583, 1990; P. J. Wilbur, R. G. Jahn, and F. C. Curran, Space electric propulsion plasmas, IEEE Trans. Plasma Sci., 19:1167–1179, 1991; P. J. Wilbur, V. K. Rawlin, and J. R. Beattie, Ion thruster development trends and status in the United States, J. Propuls., Power, 14:708–715, 1998.
Ion-selective membranes and electrodes Membrane-based devices, involving permselective, ion-conducting materials, used for the measurement of activities of species in liquids or partial pressures in the gas phase. Permselective means that ions of one sign may enter and pass through a membrane. Properties. Ion-selective electrodes are generally used in the potentiometric mode, and they superficially resemble the classical redox electrodes of types 0 (inert), 1 (silver/silver ion; Ag/Ag+), 2 (silver/silver chloride, chloride ion; Ag/AgCl/Cl-, and 3 (lead/lead oxalate/calcium oxalate/calcium ion; Pb/PbC2O4CaC2O4/Ca2+). The last, while ionselective, depends on a redox couple (electron exchange) rather than ion exchange as the principal origin of interfacial potential difference. Ionselective electrodes have a typical form that can be expressed in shorthand form, the “ionic-contact” membrane configuration: [lead wire (Cu; Ag); inner reference electrode (AgCl); inner filling solution (electrolyte: M+Cl−); membrane permselective to M+] and the “all-solid-state” membrane configuration [Cu; Ag membrane permselective to M+]. In the former, both membrane interfaces are ionexchange-active, and the potential response depends on M+ activities in both the test solution and the inner filling solution. In the latter, the membrane must possess sufficient electronic conductivity to provide a reversible, stable electron-exchange potential difference at the inner interface, with ion exchange only at the test solution side. Potentiometric responses of ion-selective electrodes take the form in Eq. (1) when an ion-selective V (measured)
RT 1/zi 1/z j =V + (kija j) ln ai + F j 0
(1)
R = the universal gas constant T = the absolute temperature F = the Faraday constant (96,487 coulombs/ equivalent) V0 = formal reference potential
Ion-selective membranes and electrodes electrode is used with an external reference electrode, typically a saturated-calomel reference electrode with a salt bridge, to form a complete electrochemical cell. Activities of the principal ion, ai, and interfering ions, aj, are in the external “test” solution and correspond to ions Mzi and Mzj, where zi and zj are charges with sign. The ion Mzj is written first because it is the principal ion favored in the membrane, for example, high ion-exchange constant and high mobility. Values known as selectivity coefficients (kij) are experimentally determined, but they can be related to extrathermodynamic quantities such as single-ion partition coefficients and single-ion activity coefficients and mobilities. When only one ion is present in a test solution, or the membrane is ideally permselective for only one ion, this equation simplifies as Eq. (2). V0 can be V (measured) = V 0 +
RT (ln ai ) zi F
(2)
written explicitly in terms of activities of species at the inner interface or in terms of solid-state activities for the all-solid-state configuration. Equation (1), variously known as the Horovitz, Nicolsky, or Eisenman equation, resembles the Nernst equation (2), but the former originates from different factors. Equation (1) cannot be derived from first principles for ions of general charge. However, when zi = zj, the equation can be derived by various means including, mainly, thermodynamic (Scatchard) equations and transport models (Nernst-Planck equations). The response slope 2.303RT/ziF is 59.14/zi mV per decade of activity at 77◦F (25◦C). Measurements reproducible to ± 0.6 mV are typically achieved, and activities can be reproducible to ± 2% for monovalent ions. A normal slope is considered nernstian, and can persist over a wide activity range, especially for solid electrodes, for example, 24 decades in the Ag+,S2− system, and 12 decades of H+ using Li+based glass membrane electrodes. Less than nernstian slopes, and in the limit zero slope, can occur at low activities of sensed species, and can occasionally occur at very high activities. Ultimate lowlevel response (detection limit) is determined by the solubility of the membrane ion-exchanging material, although impurities may cause premature failure. Because of the logarithmic dependence of potential response on activities, activity measurements using ion-selective electrodes are particularly suited to samples with wide activity variations. Standardization against pure samples or samples of the type to be determined is required. Precise measurements of concentrations over narrow ranges are not favorable, but are possible by elaborate standardization schemes: bracketing, standard additions, and related methods involving sample pretreatment. See DONNAN EQUILIBRIUM. Ion-selective electrodes are most often cylindrical, 6 in. (15 cm) long and 0.25 or 0.5 in. (6 or 13 mm) in diameter, with the lead wire existing at the top and the membrane sensor at the lower end. However, inverted electrodes for accommodat-
ing single-drop samples, and solid electrodes with a drilled hole or a special cap for channeling flowing samples past a supported liquid membrane, are possible configurations. The conventional format is intended for dip measurements with samples large enough to provide space for an external reference electrode. Single combination electrodes are useful for smaller samples, because the external electrode is built-in nearly concentrically about the ion-selective electrode. Drilled and channeledcap electrodes are intended for use with flowing samples. Microelectrodes with membrane-tip diameters of a few tenths of a micrometer have been constructed for single-cell and other measurements in the living body (see illus. for construction details). Electrodes can also be flat-form stacks of active layers on an inert support, that is, Ag/AgCl/KCl (aqueous gel)/membrane. Membranes can also be placed on an inert metal gate or insulator of a field-effect device. These flat devices are susceptible to miniaturization by methods of silicon chip technology. Socalled smart sensors have the signal-processing electronics on the chip as well. Ion-selective electrodes are intended to be used to monitor and measure activities of flowing, or stirred, solutions because electrodes detect and respond to activities only at their surfaces. The time responses of solid and liquid membrane electrodes to ideal step activity changes of the principal ion (already present in the membrane) can be very rapid: 200 milliseconds for glass to about 30 microseconds for silver bromide (AgBr). Generally this fast response cannot be observed or used, because sample mixing or diffusion of a fresh sample to an electrode surface is the limiting process. Also, almost any time two ions are simultaneously determining response, interior diffusion potential generation is involved in reaching a new steady-state potential. Similarly, formation of hydrated surface layers or layers of adsorbed matter or inhomogeneities introduces diffusion barriers. Response times from 2 to 20 s can be expected. About 20–30 samples per hour may be analyzed manually by the dip method, and about 60 per hour when samples are injected into a flowing stream of electrolyte. See ELECTRODE POTENTIAL; ELECTROLYTIC CONDUCTANCE. Classification and responsive ions. Ion-selective electrodes are classified mainly according to the physical state of the ion-responsive membrane material, and not with respect to the ions sensed. It has also proved superfluous to distinguish between homogeneous membranes and those that are made from a homogeneous phase supported physically in voids of an inert polymer, or from two homogeneous phases intimately mixed, so-called heterogeneous membranes. Glass membrane electrodes. These are used for hydrogen-ion activity measurements. Glass electrodes are based on alkali ion silicate compositions. Superior pH-sensing glasses (pH 1 to 13 or 14) result from lithium silicates with addition of di-, tri-, and tetravalent heavy-metal oxides. The latter are not chain formers. Membranes responsive to
413
414
Ion-selective membranes and electrodes
shielded cable (copper)
silver internal reference electrode
shielded cable
internal reference solution solder to copper wire solid, nonporous silver salt membrane
active membrane (a)
silver halide coating
silver wire
(b)
internal reference solution (sometimes gelled)
solution of active material
(d)
membraneretaining cap hydrophobic porous membrane
cellulose acetate membrane screw-on cap to hold membrane
polyethylene tubing
access to membrane
hydrophobic porous membrane (g)
polystyrene coil dope LaF3 crystal
wide rubber band
10 mm
sensitive tip
polystyrene paint cation glass oil
polystyrene coil dope
polyethylene tubing
single-crystal fluoride-specific membrane
(h)
50 −100 µm
NaF, NaCl, AgCl filling solution polystyrene coil dope
plastic sleeve modifiedfluorideactivity electrode
silver wire
AgCl coating
sample solution (50 µl)
spring
plastic foam saturated with active material
(f)
(e)
saturated calomel (fiber junction)
internal reference electrode
internal reference solution
solution of active material
O-ring seal
porous metal salt
(c)
internal reference electrode
internal reference electrode
metal billet
solid, nonporous silver salt membrane
silver-loaded epoxy cement
external Ag /AgCl reference electrode cement
polyethylene tubing internal Ag/AgCl reference electrode bridge solution
(i)
liquid junction (porous plug) ion-selective membrane (j)
inner solution Pyrex Ag /AgCl oil (k)
Diagrams of ion-selective electrodes. (a) Typical electrode configuration—in this example, an all-solid-state ion-selective electrode. (b) Enlarged view of the construction for metal-contacted-membrane ion-selective electrodes. (c) Enlarged view of the construction for internal electrolyte-contacted-membrane ion-selective electrodes. (d) Enlarged view of the construction for an electrode. (e–g) Enlarged views of constructions for liquid ion-exchanger-membrane ion-selective electrodes. (h) An inverted electrode microcell using a fluoride-sensing material; the reference electrode (external) is saturated calomel [SCE] (after R. A. Durst and J. K. Taylor, Anal. Chem., 39:1483, 1967). (i) Construction of a flow-through crystal electrode (after H. I. Thompson and G. A. Rechnitz, Anal. Chem., 44:300, 1972). (j) A combination electrode illustrating the usual active membrane surrounded by an attached Ag/AgCl external reference electrode. (k) An example of a cation-sensing microelectrode used in biological research (after R. N. Khuri, W. J. Flanagan, and D. E. Oken, J. Appl. Physiol., 21:1568, 1966).
Ion-selective membranes and electrodes the sodium (Na+), potassium (K+), and ammonium (NH4+) cations, and some other cations use additional aluminum oxide (Al2O3), boric oxide (B2O3), or both. The pH glasses are highly selective for hydrogen ion (H+) over other monovalent ions. The Na+-sensing glasses are not intrinsically very selective for Na+ over H+, but useful pNa measurements can be made, even in excess K+ at pH 7 or above. No glasses with high selectivity of K+ over Na+ have been found. Chalcogenide glasses containing low contents of copper(III) [Cu2+] or iron(III) [Fe3+], while called glasses, are thought to be semiconductor electrodes with a high component of electron exchange, rather than ion exchange, for establishment of interfacial potential responses to Cu2+ and Fe3+. Electrodes based in water-insoluble inorganic salts. These electrodes include sensors for the ions fluoride (F−), chloride (Cl−), bromide (Br−), iodide (I−), cyanide (CN−), thiocyanate (SCN−), sulfide (S2−), Ag+, Cu2+, cadmium (Cd2+), and Pb2+. The compounds used are silver salts, mercury salts, sulfides of Cu, Pb, and Cd, and rare-earth salts. All of these are socalled white metals whose aqueous cations (except lanthanum; La3+) are labile. The salts are Frenkeldefect solids which possess the necessary ionic conductivity. Agi+ (interstitials) or Agi− (vacancies) are the mobile species in the silver salts, while F− interstitials are mobile in LaF3. These materials are ion exchangers, and show no diffusion potential. Single crystals, doped and undoped, may be used as membranes. Pressed pellets using inert binders such as polyethylene or an insoluble salt such as Ag2S (for the silver halide electrodes) are popular. In addition, powdered salts may be suspended in silicone rubber or polyvinyl chloride (about 50:50% by weight) to form heterogeneous flexible membranes. CuS-Ag2S, CdS-Ag2S, and PbS-Ag2S pressed pellets formed at about 480◦F (250◦C) are indirectly responsive to the divalent metal ion activities through control of Ag+ activities at the electrode surface and in leached layers or surface pores by means of the common ion effect. Electrodes using liquid-ion exchangers. These are electrodes supported in the voids of inert polymers such as cellulose acetate, or in transparent films of polyvinyl chloride, and provide extensive examples of devices for sensing. Fewer cation-sensing liquid-ion exchanger systems have been found. The principal example (and among the most important) is the Ca2+-responsive electrode based on calcium salts of diesters of oil-soluble phosphonic acids. Oilsoluble metal salts of hydrophobic acids containing chelating nitrogen (N) and sulfur (S) groups can be used in electrodes for heavy-metal ion sensing. The condition for development of an ion-selective potential signal is rapid exchange of the sensed metal ion between membrane (dissolved chelate or salt) and the same ion in the test solution. If this exchange is slow, as in magnesium ion (Mg2+) exchange between porphyrins and solutions, the sensor is anion-selective. Anion-sensing electrodes typically use an oil-soluble cation Aliquat (methyl-
tricaprylammonium) or a metal ion-uncharged organic chelating agent (Ni2+ or Fe2+ phenanthroline or substituted phenanthroline cations) in a support matrix. Sensitivity is virtually assured if the salt is soluble in a mediator solvent, typically a nitro aromatic or esters of difunctional carboxylic acids: adipic, sebacic, or phthalic. Selectivity poses a severe problem since these electrodes, based on hydrophobic materials, tend to respond favorably to many oil-soluble anions. Thus construction of electrodes for the simple inorganic anions F−, hydroxide (OH−), bicarbonate (HCO3−), and hydrogen phosphate (HPO42−) is difficult. Yet many electrodes respond to SCN−, I−, Br−, nitrate (NO3−), perchlorate (CIO4−), and fluoroborate (BF4−) in accordance with the Hofmeister lyotropic series. Surfactant anion sensors use salts such as hexadecylpyridinium dodecylsulfate in o-dichlorobenzene; surfactant cation sensors use a picrate salt of the species to be measured. Acetylcholine may be measured in the presence of choline, Na+, and K+ using the tetrap-chlorophenylborate salt in a phthalate ester in polyvinyl chloride, for example. Cation and anion drug sensors can be made by dissolving a cation pair (with oil-soluble anion such as tetraphenylborate) in a plasticized polyvinyl chloride membrane. Anion drug sensors often use methyltridodecylammonium ion (Aliquat) to form the oil-soluble ion pair. Neutral carrier-based sensors for monovalent and divalent cations are closely related to ion-exchangerbased electrodes. Both systems involve ion-exchange sites, particularly negative mobile sites arising from mediators and negative fixed sites arising from support materials. Some of the available neutral carriers are hydrophobic complex formers with cations, and they can be either cyclic or open-chain species that form ion-dipole complexes. These compounds permit selective extraction (leading to permselectivity) for ions such as K+, Na+, NH4+, and Ca2+ that would ordinarily not dissolve as simple inorganic salts in the hydrocarbonlike membrane phase. Valinomycin is the best-known example, and its use in supported solvents such as dioctylsebacate provides an electrode with sensitivity of 105 for K+/Na+. Neutral carriers that form adducts using weak covalent bonds are particularly useful for constructing selective anion sensors without Hofmeister series limitations. Trifluoroacetyl-p-butylbenzene2, selective for carbonate (CO32−); bis(p-chlorobenzyl)tin dichloride, selective for HPO4−2; and some special porphyrins are examples. See ION EXCHANGE. Electrodes with interposed chemical reactions. These electrodes, with chemical reactions between the sample and the sensor surface, permit an enhanced degree of freedom in design of sensors for species which do not directly respond at an electrode surface. Two primary examples are the categories of gas sensors and of electrodes which use enzyme-catalyzed reactions. Gas sensors for carbon dioxide (CO2), sulfur dioxide (SO2), ammonia (NH3), hydrogen sulfide (H2S), and hydrogen chloride (HCl), and others can be made from electrodes responsive to H+, S2−, or Cl−. By enclosing a pH glass membrane in a thin
415
416
Ion-solid interactions layer of dilute sodium bicarbonate (NaHCO3), an electrode for partial pressure of CO2 is formed, since H+ increases in a known way with increasing dissolved CO2. Similarly, immobilized enzymes convert a substrate such as urea or an amino acid to ammonia, which can be sensed and monitored by the underlying electrode. However, increased sensitivity is accompanied by an increased response time. Each diffusion and diffusion-reaction barrier slows the transport and increases the time constant of the overall sensor electrode. See ELECTRODE; ENZYME. Applications. Electrodes for species identified above are, for the most part, commercially available. In addition, electrodes have been made and reported that are responsive to many other species. A few of these are cesium (Cs+), thallium (Tl+), strontium (Sr2+), Mg2+, zinc (Zn2+), nickel (Ni2+), (UO22+), mercury(II) [Hg], hydrogen sulfite (HSO3−), sulfate (SO42−), periodate (IO4−), perrhenate (ReO4−), halide anion complexes of heavy metals (for example, FeCl4−), pyridinium, pyrocatechol violet, vitamins B1 and B6, and many cationic drugs, aromatic sulfonates, salicylate, trifluoroacetate, and many other organic anions. Applications may be batch or continuous. Important batch examples are potentiometric titrations with ion-selective electrode end-point detection, determination of stability constants of complexes and speciation identity, solubility and activity coefficient determinations, and monitoring of reaction kinetics, especially for oscillating reactions. Ion-selective electrodes serve as liquid chromatography detectors and as quality-control monitors in drug manufacture. Applications occur in air and water quality (soil, clay, ore, natural-water, water-treatment, seawater, and pesticide analyses); medical and clinical laboratories (serum, urine, sweat, gastric-juices, extracellularfluid, dental-enamel, and milk analyses); and industrial laboratories (heavy-chemical, metallurgical, glass, beverage, and household-product analyses). See ANALYTICAL CHEMISTRY; CHROMATOGRAPHY; QUALITY CONTROL; TITRATION. Richard P. Buck Bibliography. P. L. Bailey, Analysis with IonSelective Electrodes,1988; J. Koryta and K. Stulik, Ion-Selective Electrodes, 2d ed., 1984; E. Pungor (ed.), Ion-Selective Electrodes vol. 5, 1989; R. L. Solsky, Analytical Chemistry, Fundamental Reviews, vol. 60, pp. 106R–113R, 1988.
Ion-solid interactions Physical processes resulting from the collision of energetic ions, atoms, or molecules with condensed matter. These include elastic and inelastic backscattering of the projectile, penetration of the solid by the projectile, emission of electrons and photons from the surface, sputtering and desorption of neutral atoms and ions, production of defects in crystals, creation of nuclear tracks in insulating solids, and electrical, chemical, and physical changes to the irradiated matter resulting from the passage
or implantation of the projectile (Fig. 1). Ion-solid interactions are also known as particle-solid interactions. When an energetic ion which has been produced in a plasma, a particle accelerator, or an astrophysical process impinges upon the surface of condensed matter, it experiences a series of elastic and inelastic collisions with the atoms which lie in its path. These collisions occur because of the electrical forces between the nucleus and electrons of the projectile and those of the atoms which constitute the solid target. They result in the transformation of the energy of the projectile into internal excitation of the solid. The precise nature of this excitation and the resulting physical processes are determined largely by the bombarding conditions. The principal determining factors are the species of ion, its energy and direction of incidence, the target composition and crystal structure, the target temperature, and the condition of the target surface. Such factors as the quantummechanical state of the projectile may also influence specific processes, especially if they are known to occur very near the target surface. Backscattering. One of the most basic interactions occurs when the projectile collides with a surface atom and bounces back in generally the opposite direction from which it came. This process is known as backscattering. It was first observed in 1911, when Ernest Rutherford bombarded gold foils with alpha particles from radioactive decay, and its observation led Rutherford to conclude that most of the matter in atoms is concentrated in a small nucleus. Now it is used as an analytical technique, Rutherford backscattering analysis, to measure the masses and locations of atoms on and near a surface. By measuring the energy and direction of a backscattered particle whose initial energy and direction are known, the mass of a surface atom that was struck can be inferred. Moreover, when the scattering occurs below the surface, small but predictable shifts in the measured energy can be used to infer the depth of the collision. For projectiles such as alpha particles with energies of a few MeV, the cross section for backscattering is well known. As a result, the fraction of backscattered particles is an absolute measure of the number of targets on the surface. This technique is most commonly performed with alpha particles of about 2 MeV. It is widely used in the study of other ion-solid phenomena and in analyzing the thickness and composition of thin layers, a measurement often of considerable value in research on semiconductor devices. See SCATTERING EXPERIMENTS (NUCLEI). Another backscattering technique, known as ionscattering spectrometry, uses projectiles with energies of perhaps 2 keV and thus achieves significantly greater surface specificity. However, results are more qualitative since the absolute cross sections for these collisions are not well known. Many of the advantages of Rutherford-backscattering and ion-scattering spectrometries can be combined by the use of ions in the 100–300-keV energy range. This medium-energy backscattering spectrometry has been made possible
Ion-solid interactions Key: atoms and ions from target (including sputtered ions) atoms and ions from beam (including implanted and backscattered ions) secondary electrons photons
Fig. 1. Processes that can occur when a material is subjected to particle bombardment.
energies increases less rapidly. This flattening is known as the density effect and is the result of the rapid rearrangement or polarization of the electrons in the medium by the increasingly strong transverse electric field of the projectile. This rearrangement partially shields the projectile and so slows the rate of increase of the stopping power, but the suddenness of the motion results in the emission of electromagnetic radiation. When the speed of the projectile
Lindhard region In S(E )
by advances in time-of-flight particle detection technology. Penetration phenomena. Although backscattering events are well enough understood to be used as analytical tools, they are relatively rare because they represent nearly head-on collisions between two nuclei. Far more commonly, a collision simply deflects the projectile a few degrees from its original direction and slows it somewhat, transferring some of its kinetic energy to the atom that is struck. Thus, the projectile does not rebound from the surface but penetrates deep within the solid, dissipating its kinetic energy in a series of grazing collisions. The capacity of a solid to slow a projectile is called the stopping power S(E), and is defined as the amount of energy lost by the projectile per unit length of trajectory in the solid. The stopping power is a function of the projectile’s energy and has different values for different projectiles. It is commonly measured in units of eV/cm or an equivalent unit of eV/(g/cm2) obtained by dividing the former value by the density of the solid. Stopping power is of central importance for many phenomena because it measures the capacity of a projectile to deposit energy within a thin layer of the solid. Since this energy drives secondary processes associated with penetration, the stopping power of the projectile is an important scaling parameter. In order to understand the graph of the stopping power of an energetic nucleus in matter as a function of its energy (Fig. 2), it is necessary to distinguish collisions involving the nuclei and the electrons of the target. When one charged particle moves past another at high velocity, the average force acting on the stationary particle is perpendicular to the trajectory of the moving particle. This force acts for a brief period of time and gives the stationary particle a small momentum. To simplify calculations, it is assumed that the target particle, whose mass is m, does not move very much during the collision and that the moving particle is not significantly deflected by it. This is called the impulse approximation. The magnitude p of the momentum acquired by the struck particle is independent of its mass so that the energy that it receives in the collision is (p)2/2m. Thus, an electron whose mass is about 1/2000 that of even a single nucleon will receive much more energy from a collision than a nucleus. The portion of the graph (Fig. 2) for which this simple explanation is satisfactory has a slope proportional to 1/E and is labeled the Bethe region. Here a projectile penetrates the target as a bare nucleus devoid of electrons because they have been stripped away in collisions. At relativistic velocities the electric field of the projectile becomes increasingly concentrated in the plane perpendicular to the trajectory where it can interact with the target more efficiently. As a result, the stopping power eventually reaches a minimum, called the ionization minimum, and then begins to rise again. For energies above the ionization minimum the stopping power increases, but at substantially higher
417
Bethe region ionization minimum
nuclear stopping
density correction In E
Fig. 2. Graph of the logarithm of the stopping power S(E) of a heavy ion in matter as a function of the logarithm of its energy E.
418
Ion-solid interactions exceeds the speed of light in the medium, this radiation is called Cerenkov radiation, and its direction is determined by the velocity of the projectile and the medium’s index of refraction. See CERENKOV RADIATION; RELATIVITY. As a projectile’s energy is lowered below the Bethe region, it begins to capture electrons from the medium. Very roughly, a new electron is captured when the velocity of the projectile in the medium is comparable to the velocity of the electron in its quantum-mechanical orbit about the projectile. The presence of these captured electrons surrounding the penetrating nucleus shields it from the electrons of the medium and thus reduces the stopping power, resulting in a peak in the curve (Fig. 2). However, collisions with target electrons continue to dominate the stopping power below this peak. This portion of the graph is labeled the Lindhard region and has a slope proportional to E1/2. For projectiles with energies typically below a few hundred keV, the collisions are not simply with electrons but are elastic and inelastic collisions between whole atoms. This is called nuclear stopping since the nuclei of struck atoms acquire significant amounts of kinetic energy. The stopping mechanisms for ions in solids result in their having well-defined ranges. This behavior is distinctly different from the penetration of photons (light, x-rays, and so forth) in matter. Photon intensity in matter decreases exponentially with depth with a characteristic length, but individual photons at any depth have the same energy as they had at the surface. Thus, photons are destroyed but do not lose energy, while ions lose energy but are not destroyed. In crystalline solids it is possible to pick certain directions where the electron density is significantly lower. Ions penetrating in these channels between crystal rows or planes experience fewer collisions than ions penetrating in other directions. In fact, because an ion’s potential energy increases in all transverse directions away from the axis of such a channel, the ion will tend to be constrained to move along it. This phenomenon, known as channeling, has been studied extensively. At higher energies the ion can be excited to higher quantum-mechanical states through its resonant coherent interaction with the channel. When the ion relaxes, it emits radiation. See CHANNELING IN SOLIDS. Radiation effects. The dissipation of a projectile’s kinetic energy in a solid can result in a number of secondary phenomena. Near the surface, electrons given sufficient energy may directly escape. These are often accompanied by Auger electrons and x-rays emitted by atoms which have lost inner-shell electrons through collisions. See AUGER EFFECT. In many insulating solids (including mica, glasses, and some plastics) the passage of an ion with a large electronic stopping power creates a unique form of radiation damage known as a nuclear track. The track is actually a cylindrical volume surrounding the trajectory of a single ionizing projectile which has been rendered much more susceptible to chemical attack. As a result, when the substance is chemically etched,
conical pits visible under an ordinary microscope are produced where ionizing particles have penetrated. This is particularly significant since the passage of single projectiles is registered and may be subsequently observed. Although the mechanism of track formation is still the subject of theoretical research, track-forming materials are widely used to detect ionizing radiation. See PARTICLE TRACK ETCHING. When the projectiles have energies in the nuclear stopping range, a number of additional phenomena may occur. Of particular significance is the formation of a collision cascade. In the nuclear stopping region it is relatively likely that the projectile will transfer significant amounts of energy to individual target atoms. These atoms will subsequently strike others, and eventually a large number of atoms within the solid will be set in motion. This disturbance is known as a collision cascade. Collision cascades may cause permanent damage to materials, induce mixing of layers in the vicinity of interfaces, or cause sputtering if they occur near surfaces. The dynamical description of collision cascades is extremely complicated because of the large numbers of particles involved. Computer modeling and a branch of statistical mechanics called transport theory have been most successful, although both have limitations. The comprehensive mathematical description of collision cascades with arbitrary energy densities, realistic interatomic potentials, and consistent treatment of target surfaces is an active area of research. See BOLTZMANN TRANSPORT EQUATION; CRYSTAL DEFECTS; ION BEAM MIXING; RADIATION DAMAGE TO MATERIALS; SIMULATION; SPUTTERING. Although it is a much less ubiquitous phenomenon than collisional sputtering, particle emission from insulating surfaces can also be initiated by ions with energies in the electronic stopping region by mechanisms which are collectively termed desorption. Where sputtering is primarily a statistical process understandable in terms of classical mechanics, desorption is inherently quantum mechanical, involving specific electronic transitions. As a result, desorption can be initiated equally efficiently by ions, electrons, and photons. Very large and fragile biological molecules can be removed from surfaces virtually intact following the impacts of MeV heavy ions or pulsed laser irradiation. This has made it possible to mass-analyze a large number of biological molecules which could not previously be studied by conventional mass spectrometry because they were too fragile to be thermally evaporated. See DESORPTION. Surface interactions. Much work in the field of ionbeam interactions with matter has dealt with effects to the material and the projectile which occur in the region of the surface. Projectiles with energies ranging from chemical bond energies to penetration energies can be used to study surface structure, prepare unique deposition layers, include unusual chemical reactions, and even select specific reaction mechanisms. For example, bombardment of a surface by 150 keV C+ ions has been shown to be one route to the formation of diamondlike carbon. Similarly, by studying the atomic state of hyperthermal ions
Ion-solid interactions such as Li+ which scatter from the surface, information about both the structure of the surface and the nature of the interaction itself can be inferred, thus providing valuable additional insights to those which have been available previously from beam-foil spectroscopy. See BEAM-FOIL SPECTROSCOPY. Complex and fascinating phenomena have been observed in low-energy ion-surface collisions. They include instances where ions appear to make multiple impacts in much the same way as a stone skips on the water. The electron-cyclotron-resonance (ECR) ion source has made it possible to study the impacts of extremely slow, yet very highly charged ions with surfaces. One result of this work is the recognition that at distances greater than 1 nm from the surface these nearly bare nuclei may capture significant numbers of electrons into their high-lying quantum states while lower-lying states may remain essentially empty. The resulting structures, called hollow atoms, may survive in this condition briefly after impact with the surface. See ION SOURCES. Advances in computing have made it possible to simulate ion-surface impacts with far-reaching consequences for the development of intuition about these events (Fig. 3). Such simulations represent essential tools in the design and interpretation of experiments. See FULLERENE; MOLECULAR MECHANICS. Applications. The most ambitious use of ion beams in industry is in the manufacture of integrated circuits. Ion implantation can be used to modify the electrical properties of semiconductors far more precisely than other techniques. As a result, much more elaborate structures can be produced. Ion implantation is an essential element of the manufacture of virtually all digital integrated circuits in use. See INTEGRATED CIRCUITS; ION IMPLANTATION. Another commercial application of ion beams with significant promise is for materials modification. It has been demonstrated that ion implantation can greatly improve the wear and corrosion resistance of metals. One area that is expected to benefit from these advances is the technology of human prosthetic implants. See PROSTHESIS. Ion-solid processes are used extensively as tools in many areas of research. They permit highly sensitive analyses for trace elements, the characterization of materials and surfaces, and the detection of ionizing radiation. Techniques employing them include secondary ion mass spectrometry (SIMS) for elemental analysis and imaging of surfaces, proton-induced x-ray emission (PIXE), ion-scattering spectrometry (ISS), medium-energy and Rutherford backscattering spectrometries (MEBS and RBS), and nuclear reaction analysis (NRA) for elemental and isotopic depth profiling. Ion-solid interactions are also fundamental to the operation of silicon surface-barrier detectors which are used for the measurement of particle radiation, and of nuclear track detectors which have been used in research as diverse as the dating of meteorites and the search for magnetic monopoles. Among the notable products of ion-solid research are the development of ion-beam assisted deposition (IBAD) for the rapid production of thick pro-
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. Molecular dynamics simulation of a 250-eV C60+ molecular ion (buckminsterfullerene or buckyball) scattering from a hydrogen-terminated diamond surface. Atomic positions are shown at various times. (a) 0 femtosecond (b) 40 fs. (c) 100 fs. (d) 160 fs. (e) 201 fs. (f) 280 fs. (After R. C. Mowrey et al., Simulations of C60 collisions with a hydrogen-terminated diamond [111] surface, J. Phys. Chem., 95:7138–7142, 1991)
tective coatings, and the demonstration of thin-film waveguides and waveguide lasers produced in insulators by ion implantation. See ACTIVATION ANALYSIS; CHARGED PARTICLE BEAMS; MAGNETIC MONOPOLES; PARTICLE DETECTOR; PROTON-INDUCED X-RAY EMISSION (PIXE); SECONDARY ION MASS SPECTROMETRY (SIMS); SURFACE PHYSICS. Robert A. Weller Bibliography. R. Behrisch (ed.), Sputtering by Particle Bombardment, 3 vols., 1981, 1983, 1991; W. K. Chu, J. W. Mayer, and M. A. Nicolet, Backscattering Spectrometry, 1978; J. P. Eberhart, Structural and Chemical Analysis of Materials, 1991; L. C. Feldman and J. W. Mayer, Fundamentals of Surface and Thin Film Analysis, 1986; J. R. Tesmer et al. (eds.), Handbook of Modern Ion Beam Materials Analysis, 1995; J. F. Ziegler, J. P. Biersack, and V. Littmark, The Stopping and Range of Ions in Solids, 1985.
419
420
Ion sources
Ion sources Devices that produce positive or negative electrically charged atoms or molecules. See ION. In general, ion sources fall into three major categories: those designed for positive-ion generation, those for negative-ion generation, and a highly specialized type of source designed to produce a polarized ion beam. The positive-ion source category may further be subdivided into sources specifically designed to generate singly charged ions and those designed to produce very highly charged ions. Desirable qualities of an ion source are large yield, high ionization efficiency, low energy spread, high brightness, and low emittance. Practical considerations such as reliability, long source life, and ease of changing ion species are also important. Ion sources have acquired a wide variety of applications. They are used in a variety of different types of accelerators for nuclear research; have application in the field of fusion research; and are used for ion implantation, in isotope separators, in ion microprobes, as a means of rocket propulsion, in mass spectrometers, and for ion milling. See ION IMPLANTATION; ION PROPULSION; ISOTOPE SEPARATION; MASS SPECTROSCOPE; NUCLEAR FUSION; PARTICLE ACCELERATOR; SECONDARY ION MASS SPECTROMETRY (SIMS). Methods of Positive-Ion Formation The principal methods of positive-ion formation are electron impact, surface ionization, spark discharge, laser ionization, field ionization, thermal ionization, and sputtering. Electron impact. A common method of ionizing a gas or vapor is to pass high-velocity electrons through it, with the ions being formed as a result of electron-atom collisions. Electron energies are typically a few 100 eV but in some special sources may be as high as 20 keV. An externally applied magnetic field is frequently used to cause the electrons to travel along a helical path and thereby increase the ionization efficiency. Examples of ion sources utilizing this concept are the duoplasmatron and the Penning ion source. Positive surface ionization. Atoms possessing low ionization potentials can be ionized by allowing them to strike a heated surface having a high work function. Provided that the ionization potential of the atom is less than or about equal to the work function of the surface, there is a high probability that the atom will be thermally desorbed as a positive ion. The method is particularly well suited, though not entirely restricted, to producing ions of the alkali metals, all of which have ionization potentials of less than 5.4 eV. Some high work function metals are platinum (approximately 5.7 eV), tungsten (approximately 4.5–5.5 eV), and rhenium (approximately 5 eV). See IONIZATION POTENTIAL; WORK FUNCTION (ELECTRONICS). Spark discharge. There are several variations of this technique, but basically a spark is induced be-
tween two electrodes, one of which, at least, contains the element to be ionized. Generally speaking, the spark consists of a high-density, high-temperature plasma from which ions can be extracted. The spark can be produced by applying a high alternating potential between two fixed electrodes or by mechanically breaking contacting electrodes. See ELECTRIC SPARK. Laser ionization. A focused beam from a highpower pulsed laser can be used to produce a small ball of dense plasma from essentially any solid, and positive ions can be extracted from this plasma. The high temperature of the plasma results in the formation of many multiply stripped ions and thus may prove a very effective method of generating highly charged positive ions. In principle, lasers or other strong sources of electromagnetic radiation can be used to produce ions by photoionization. A photon can directly ionize an atom if its energy exceeds the ionization potential of the atom. Unfortunately, the probability of photoionization is low (the cross section is of the order of 10−19 cm2), making it difficult to design efficient ion sources based on this process. See LASER. Field ionization. If an atom passes close to or gets absorbed on a very sharp point where the electric field exceeds a few times 1010 V/m, there is a probability that it will get ionized; the phenomenon is known as field ionization. Such large electric fields can be achieved in the vicinity of a specially sharpened tungsten needle placed close to an annular electrode, and gas or vapor passing close to the tip of the needle can be ionized. Field emission or ionization is generally believed to be the underlying operating principle of a novel type of ion source known as the electrohydrodynamic source. In this source a conducting liquid, usually a metal, is allowed to be drawn down a fine-bore tube by capillary action. When an electric field is applied to the tip of the tube, the liquid meniscus, normally spherical, distorts and becomes conical. As the electric field is increased, the tip of the cone becomes sharper, and eventually the field at the tip becomes sufficiently large to cause field emission. See FIELD EMISSION. Thermal ionization. Although the term thermal ionization is ill-defined, it is generally used in the context of heating certain complex compounds, resulting in positive-ion emission. An example is the emission of lithium ions from a heated surface coated with β-eucryptite (a lithium aluminosilicate mineral). The technique has found extensive application in mass spectroscopy to produce a wide variety of ions. The sample to be ionized is usually mixed with silica gel and phosphoric acid and deposited on a rhenium filament. After a preliminary baking, the filament is introduced into the ion source and heated to the point of positive-ion emission. Sputtering. When a solid is bombarded with energetic heavy ions, a fraction of the sputtered particles leaves the surface as ions. This fraction is usually too low for direct application in an ion source, but the sputtering process is frequently used to introduce
Ion sources solids into an electron impact source such as a Penning source. See SPUTTERING. Methods of Negative-Ion Formation All elements can be positively ionized, but not all form stable negative ions. For example, none of the noble gases forms negative ions. However, helium is an exception in that it does have a metastable negative ion with a lifetime of about 1 millisecond. The noble gases are not the only elements that do not form stable negative ions, but most form metastable ones with lifetimes long enough to permit acceleration. Nitrogen has an exceptionally short-lived metastable negative ion, and it is customary to accelerate either NH− or NH2− molecular ions, both of which are stable. Direct extraction. Most discharge sources, such as the duoplasmatron (Fig. 1), yield negative ions when the polarity of the extraction voltage is reversed. However, the yield is usually low, the electron current is high, and there are difficulties when operating them with elements other than hydrogen. These sources are now used almost exclusively to generate intense beams of hydrogen and its isotopes, and several important changes have been made to improve their performance. The negativeion-to-electron yield from a direct extraction duoplasmatron can be greatly improved if the intermediate electrode is displaced by a millimeter or so off the source axis (Fig. 1b). The introduction of cesium vapor into the discharge plasma greatly enhances the negative-ion yield. Negative hydrogen currents from such sources have been increased from a few microamperes to several tens of milliamperes. Charge exchange. When a positive-ion beam, ranging in energy from a fraction of a kiloelectronvolt to several tens of kiloelectronvolts, is passed through a gas or vapor, some of the ions emerge negatively charged. At the optimum energy (depending upon the ion), and with vapors of lithium, sodium, and magnesium, the negatively charged fraction is quite high, ranging from 1 to 90%. The technique is also highly effective for the creation of metastable negative ions, such as helium. Cesium-beam sputtering. When a solid surface is sputtered with cesium positive ions, a surprisingly large fraction of the sputtered particles emerge as negative ions. This fraction can be increased by almost an order of magnitude by overlaying the sputter surface with additional neutral cesium. Highly efficient and versatile negative-ion sources involving cesium sputtering have been developed. Negative surface ionization. Just as positive ions of elements possessing low ionization potentials can be generated by thermally desorbing them from high work-function surfaces, negative ions having large electron affinities can be similarly generated on a low work-function surface. The method is particularly suited to generating negative ions of the halogens, all of which have electron affinities in excess of 3 eV. A particularly suitable low-work-function surface is lanthanum hexaboride, which is reported to have a work function of about 2.6 eV.
filament, 2.5 V, 30 A
421
feed through insulators, alumina ceramic brazed to metal −100 V
gas feed
−70 V
coil lead
cooling out
magnet coil, 2000 ampere-turns
mild steel magnetic path cooling in
cathode filament
stainless steel Pyrex insulator, plastic bond
Pyrex viewing port
cooling in
−30 kV (a)
ion beam
nickel-plated extractor electrode
5 cm cathode filament discharge region
discharge concentrated by magnetic field
intermediate electrode anode
(b)
1 cm
(c)
Fig. 1. Duoplasmatron ion source for accelerators. (a) Diagram of entire source. (b) Detail of discharge and extraction regions. (c) Drawing of cathode filament. 1 cm = 0.4 in.
Positive-Ion Source Concepts Positive-ion source concepts include the duoplasmatron source for protons and the Penning ion source and ion confinement sources for multiply charged heavy ions. Duoplasmatron. The duoplasmatron (Fig. 1) is a high-current proton source. It makes use of an arc discharge which is constricted as it passes into a very strong magnetic field shaped by iron or mild steel inserts in an intermediate electrode and anode. The beam is extracted at the point where the arc has reached a very small diameter and a very high brilliance. Sources of this type have been developed for accelerators. See ARC DISCHARGE. Heavy-ion sources. The term heavy ion is used to designate atoms or molecules of elements heavier than helium which have been ionized. As was mentioned earlier, ion sources that are used to generate such ions fall into two categories: those intended to form singly charged ions and those designed to
422
Ion sources produce multiply charged ions. A heavy ion can be singly ionized (one electron removed or added), can be fully stripped as in argon 18+, or can have any intermediate charge state. Singly charged heavy ions are most frequently used in isotope separators, mass spectrographs, and ion implantation accelerators. These are much easier to generate than multiply charged ions, and frequently the experimenter has several source concepts to choose from, depending upon the application and the physical characteristics of the element to be ionized. Penning ion sources. This source is based on a highcurrent gaseous discharge in a magnetic field with gas at a relatively low pressure (10−3 torr or 0.1 pascal). The source (Fig. 2) consists of a hollow anode chamber, cathodes at each end, a means for introducing the desired element (usually a gas), and electrodes for extracting the ions (not shown). The cathode may be heated to emit electrons, which then help to initiate the arc discharge current, creating the plasma in which the atoms are ionized. The discharge column between the cathodes (the plasma) consists of approximately equal numbers of low-energy electrons and positive ions. The electron density is much larger than can be accounted for by the primary
axial magnetic field anode cathode
cathode
plasma + −
arc power supply
(a) water cooling and electrical connection to cathode support O
anode water cooling copper anode gas inlet ion extraction slit tantalum cathode boron nitride insulator copper cathode support
O
(b) Fig. 2. Penning ion source. (a) Schematic diagram illustrating basic principles. (b) Section showing geometry. (After J. R. J. Bennett, A review of PIG sources for multiply charged heavy ions, IEEE Trans. Nucl. Sci., NS-19(2):48–68, 1972)
TABLE 1. Penning heavy-ion source performance +
Charge state distributions (IQ /I1+ ) Element
1+
Argon Calcium Krypton Xenon
1 1 1 1
3+
4+
8.5 8 23 22 5 5.5 7 9
5+ 3 15 6 9
6+ 7+ 0.8 3 4.5 7.5
0.09 0.3 4 5.5
8+
9+
12+
0.01 0.035 2.2 0.6 4 1.5 0.025
electrons from the cathodes. The average energy of plasma electrons may range from a few volts to a few tens of volts. Electrons travel parallel to the magnetic field, are reflected from the opposite cathode, and make many traversals of the length of the hollow chamber. The electrons confined by the magnetic field and the cathode potential thereby have a high probability of making ionizing collisions with any gas present in the chamber. The net result of all the processes in the arc plasma is that some partially stripped atoms diffuse perpendicular to the magnetic field out of the arc, experience the field of the accelerating electrode, and are moved into the accelerator. The arc potential may be constant with time, or it may be pulsed so that ions are produced as needed by the accelerator. High yields of charge states 1+ through 8+ (and for heavier elements, perhaps up to 12+) have been obtained for many elements of the periodic table (Table 1 shows four of these). See ARC DISCHARGE; PLASMA (PHYSICS). Most studies with Penning sources were made with gases, thus restricting nuclear research programs to a rather limited number of projectiles, until it was discovered that some of the krypton ions from the source were partially accelerated and then returned back into the source where they sputtered the source body material (copper) into the discharge, resulting in a prolific beam of highly charged copper ions. It was a short step to introduce small pieces of various solids immediately behind the ion extraction slit (Fig. 2b) and to use the source to produce beams of a wide range of elements. Roy Middleton Electron cyclotron resonance sources. An electron cyclotron resonance source for highly charged ions consists of a metallic box occupied by low-pressure vapors, microwaves, and specific magnetic fields for plasma confinement. The box can be of any shape but must be large with respect to the wavelength of the microwaves in order to act as a multimode cavity. Plasma confinement is generally obtained by superimposing solenoidal and multipolar fields such that the modulus of the magnetic field is minimal in the center of the box and maximal near the walls. In between, there should be a closed magnetic surface where the Larmor frequency of the electrons equals the frequency of the injected microwaves. Electrons crossing this surface are then energized by the electron cyclotron resonance. When they pass many times through the resonance, according to their random phase, they acquire a global stochastic electron cyclotron resonance heating, yielding
Ion sources second stage
first stage solenoid
solenoids rf input
multimode cavity
gas
ion extraction
minimum magnetic field hexapole (SmCo5)
ECR2 surface
ECR1 surface
Fig. 3. Electron cyclotron resonance ion source.
energies of tens of kiloelectronvolts, thus exceeding the ionization potentials of many highly charged ions. However, since the ion charges are obtained step by step through successive electron–ion collisions, the ions need long exposure times to many electron impacts (that is, long plasma confinement). The source (Fig. 3) generally has two stages in series. In the first, a cold plasma is created at ECR1, which diffuses toward a very low-pressure second stage, where ECR2 energizes the electrons inside the confined plasma. Owing to the absence of electrodes, the longevity of the source is unlimited. The magnetic fields are obtained with ordinary or superconducting coils, permanent magnets,or combina-
gas feed
shield
tion of these. The operation of the source can be continuous or pulsed. Most sources utilize frequencies between 2.4 and 16 GHz. The performance improves with increasing frequency. See CYCLOTRON RESONANCE EXPERIMENTS; MAGNET; SUPERCONDUCTING DEVICES. R. Geller Electron-beam sources. The electron-beam ion source (EBIS) produces the most highly charged ions, up to fully stripped U92+. It can be operated in many different modes depending on the application, the desired ion species, charge state, and ion-beam characteristics, including its duty cycle. The EBIS normally produces ion beams with a small emittance and a narrow energy spread. The highest charge states are normally produced in a batch mode, yielding ion pulses with a charge as high as 1 nC, which allows for instantaneous peak currents as high as 1 µA. The time-averaged currents are normally in the range of nanoamperes to picoamperes because of the required batch production time, which increases with the desired charge state. The majority of EBISs or electron-beam ion traps (EBITs) are used to study how the produced highly charged, low-energy ions interact with the internal electron beam (in the EBIT) or with external gaseous or solid targets (in the EBIS). Others are used as injectors for pulsed accelerators, storage rings, or ion traps. A few are used to study further developments or new applications. The ion source (Fig. 4) is based on an electron beam which successively strips the ions until the desired high charge state is reached. The electron beam is launched from an electron gun, confined and normally compressed by a strong magnetic solenoid, and later adsorbed by a collector. Ions trapped inside
solenoid
drift tubes
leak valve
suppressor collector repeller lens
electron gun
ion beam
electron beam
(a)
U
gate seed trap
dam main trap
0
(b) Fig. 4. Electron-beam ion source. (a) Cross section. (b) Electric potential (U) applied to the drift tubes during the production of highly charged ions.
423
424
Ion sources the space charge of the electron beam are ionized at a rate which depends on the electron beam’s energy and density, with high-power electron beams and high-field solenoids producing high rates. The charge states produced increase with increasing confinement time until the ionization rate is matched by the rate of electron capture mainly in collisions with the residual gas. Hence the highest charge states are produced with cryogenic EBISs which use a superconducting high-field solenoid (3–6 tesla) to achieve high electron-beam compression and use the required liquid helium to cryopump the production region to reduce the residual gas densities. The electron beam is surrounded by a series of drift tubes, with their potential being used to adjust the electronbeam energy and to control the flow of the ions, allowing for the variation of the confinement time. The highest charge states are normally produced in a batch mode which begins with the injection of low-charged seed ions either from an external ion source or from an internal seed trap in the fringing fields of the solenoid, where the seed gas is ionized by the lower-density electron beam. After the gate or the dam has been raised, the ions are confined in the axial direction with potentials that exceed the potential applied to the intermediate drift tubes, the main trap. After the confinement time required to reach the desired charge state (10−4 to 50 s), the tungsten (1100°C) ionizer Cs vapor
sputter cone 20 keV negative ion
+ + + + + ++ Cs+ −20 kV
(a)
main-trap potentials is raised until the ions escape the source. A slow rise of the main-trap potential decreases the ion beam’s energy spread and simultaneously decreases the instantaneous current while increasing the duty cycle. See MAGNET; SOLENOID (ELECTRICITY); VACUUM PUMP. Martin P. Stockli Negative-Ion Source Concepts Negative-ion source concepts will be discussed, including the methods of negative-ion formation discussed above. Charge exchange source. Although charge exchange can be used to provide a very large variety of negative ions, it is relatively infrequently used, with the exception of producing the metastable negative ion of helium. Indeed, charge exchange is the sole method of producing negative helium beams. In a typical charge exchange source, about 1 milliampere of positive helium ions is generated in a duoplasmatron ion source which is usually at ground potential. The ions are extracted by an electrode at a potential of about −20 kV, focused by an electrostatic lens, and directed through a donor canal usually containing lithium vapor. In the vapor a fraction of the positively charged helium ions sequentially picks up two electrons to form negative ions. The fraction is usually a little less than 1%, resulting in a negative helium beam of several microamperes. Cesium-beam sputter source. The cesium-beam sputter source is the negative-ion source most widely used on tandem accelerators. Much of its success is due to the fact that ion species can be changed rapidly, the ion output is usually high (Table 2), and the source will operate for several hundreds of hours before a major cleanup is necessary.
TABLE 2. Negative-ion currents from cesium-beam sputter source
Element
Negative-ion current, µA
Element
Negative-ion current, µ A
Lithium Boron Carbon Oxygen Silicon
2 2 >50 >50 20
Sulfur Nickel Copper Gold Lead
20 6 3 10 0.1
insulators sputter cone
gas inlet
ionizer heater
cesium reservoir 18-position cone wheel
porous tungsten ionizer
(b)
cone access and viewing port cooling (Freon)
Fig. 5. Cesium-beam sputter source: (a) schematic showing operating principles; (b) section of a typical source.
Cesium positive ions are formed in the source (Fig. 5a) by passing cesium vapor through a porous disk of tungsten, heated to about 2012◦F (1100◦C), by the surface ionization process. These ions are accelerated through a potential difference of between 20 and 30 kV and are allowed to impinge upon a hollow cone that is either fabricated from or contains the element whose negative ion is required. Sputtering of the cesium-coated inner surface of the cone results in a large fraction of the sputtered particles leaving the surface as negative ions. An appreciable fraction of these are extracted from the rear hole of the sputter cone because of electric field penetration and are accelerated toward the ground electrode as a beam.
Ion sources magnetic field (40 mT) cathode (−150 V) xenon gas
sputter cathode (−1 to −2 kV) sphere
insulator
accelerating electrode
insulator
cesium vapor filament
Fig. 6. Cesium-vapor Penning source for generation of negative ions.
The sputter cones or targets can be inserted into a cooled copper wheel resembling the chamber of a revolver (Fig. 5b). Thus, by rotating the wheel, sputter cones can be rapidly changed, enabling the negative-ion species to be quickly changed. In addition, negative-ion beams of gaseous elements, such as oxygen, can be formed by leaking the gas into the source and directing the flow onto a suitable getter surface such as titanium. Cesium-vapor Penning source. The cesium-vapor Penning source (Fig. 6) is a direct-extraction negative-ion source. Basically this is a conventional Penning source but with two important modifications making it suitable for negative-ion generation. The first is the introduction of a third sputter cathode, which is the source of negative ions. This cathode is made from or contains the element of interest, has a spherical face centered on the extraction aperture, and is operated at a higher negative potential than the normal cathodes. The second change involves introduction of cesium vapor into the arc chamber in addition to the support gas (usually xenon). The source operates in the normal Penning mode, and some of the cesium vapor introduced into the arc chamber becomes ionized and is accelerated toward the third sputter cathode. The negative ions that are formed as a result of sputtering are focused and accelerated toward the extraction aperture, and under the influence of the strong electric field generated by the acceleration electrode are extracted as an intense low-divergence beam. The negative-ion yield of the source is quite good and is comparable to that of the cesium-beam sputter source. Roy Middleton Polarized Ion Sources A polarized ion source is a device that generates ion beams in such a manner that the spins of the ions are aligned in some direction. The usual application
is to inject polarized ions into a particle accelerator; however, it also has applications in atomic physics. The possible types of polarized sources are numerous, for in theory the nuclei of all kinds of atoms can be polarized, provided their spin is not zero. See SPIN (QUANTUM MECHANICS). The original type of source, developed in 1956, generates only positive ions, while the metastablestate or Lamb-shift type of polarized ion source produces a high-quality negative ion beam with a high degree of polarization. The older type of source is referred to as the conventional or ground-state type of polarized ion source. Its output current of positive ions is an order of magnitude larger than the negativeion output from the Lamb-shift source. With these two types of sources and their variants, polarized ions have been obtained from hydrogen, deuterium, tritium, helium-3, both isotopes of lithium, and others. The extra complication involved in producing polarized ions is such that the output is a factor of a thousand or more below the output of a moderately sized unpolarized ion source. Conventional or ground-state source. In this type of source (Fig. 7), the first step consists in forming a beam of atoms in the ground state by a technique similar to that used in molecular beams. In the case of hydrogen or deuterium, this is done by dissociating the molecular gas in a glass discharge tube and allowing the atoms to escape through a nozzle. The atoms escaping at thermal energies are collimated dissociator (water-cooled) isolation valve, oil diffusion pump
H2 or D2 gas in
adjustable first skimmer isolation valve, oil diffusion pump differential pumping baffle isolation valve, oil diffusion pump tapered sextupole magnet ion pump
+
intermediatefield rf transition unit fast high-vacuum roughing pump strong field ionizer liquid nitrogen trap
20 -keV H+ or D+ beam
focus lens unit
Fig. 7. Conventional or ground-state polarized ion source. (After H. F. Glavish, Polarized ion sources, in Proceedings of the 2d Symposium on Ion Sources and Formation of Ion Beams, Berkeley, pp. IV-1-1 through IV-1-7, 1974)
425
426
Ion sources into a well-directed beam by plates with holes or an adjustable skimmer. High-capacity diffusion pumps sweep away the large quantity of excess hydrogen or deuterium. See MOLECULAR BEAMS. The beam is then passed along the axis of an inhomogeneous magnetic field, which is most commonly generated by a sextupole magnet. This type of magnet consists of six magnets arranged in a circular pattern with alternating polarities. In a sextupole magnet the absolute magnitude of the field increases as the distance squared from the axis. The atoms are subjected to a radial force that is proportional to their magnetic moment times the gradient of the absolute magnitude of the field strength. In the case of a sextupole magnet the force is proportional to the first power of the distance from the axis. The sign of the force does not depend upon the direction of the magnetic lines, but only on the projection of magnetic moment along these lines of force or mj, where j is the spin value of the electron. (The atomic magnetic moment results almost entirely from the electron). The result is that atoms with a positive value of mj are subjected to a force that is directed radially inward and pass through the sextupole magnet, while atoms with a negative mj experience a force that is directed outward and are rapidly lost from the beam. Out of the sextupole comes a beam of atomic hydrogen that is polarized with respect to the orientation of its electrons but is, as yet, unpolarized in its nuclear spin. See ELECTRON SPIN; MAGNETON. Since aligned nuclei rather than aligned electrons are desired, it is necessary to subject the atomic beam to other fields. Each hydrogen atom is in one of two pure states. It is possible to apply an oscillating magnetic field in combination with a dc magnetic field that will flip the sign of mI (the projection of the nuclear spin) of one of the pure states and not the other. That aligns the spins of the nuclei but may depolarize the electrons. That does not matter, however, since they will be removed. The final stage is to send the atomic beam into a strong solenoidal magnetic field. As the atoms from the sextupole field—having all orientations in each cross-sectional plane—enter the solenoid, they adiabatically come into alignment with the parallel lines of force within the solenoid since their mj components of spin are conserved. In the solenoid the atoms are ionized by energetic electrons as in an arc discharge. The ionizer is actually the most difficult part of this type of polarized source to make function efficiently, even though it is conceptionally simple. The ionizer is followed by electric fields that accelerate and focus the ions to get a beam that can be accepted by the accelerator. Lamb-shift or metastable-atom source. The polarization process in the Lamb-shift type of source is also performed upon atoms, in this case, metastable ones. The process is most efficient if the atoms have a velocity of approximately 10−3 of light rather than thermal velocity as in the case of the ground-state type of source. To get the beam, hydrogen, deuterium, or tritium can be used, but only hydrogen is
discussed in this article. The hydrogen is ionized in a conventional ion source such as a duoplasmatron. The H+ ions are then accelerated and focused into a beam at about 500 eV. The beam is passed through cesium vapor where cesium atoms donate electrons which are resonantly captured in an n = 2 state by the hydrogen ions. Atoms are formed in both the 2p and the 2s states in the cesium vapor. However, those in the 2p state decay almost immediately to the ground state. The small energy difference between the 2p and the 2s states is the Lamb shift. The lifetime of the 2p atoms is 1.6 × 10−9 s, while the lifetime of the 2s atoms is 0.15 s because two photons must be emitted simultaneously in their decay to the ground state. Actually few 2s atoms decay by emission of two photons for they are necessarily subjected to small electric fields which mix into the 2s the 2p wave function and its tendency to decay to the ground state. To take advantage of this tendency to decay to the ground state, apparatus can be built so that those atoms which have the undesired value of mI are stimulated to decay, while those with the desired value of mI are allowed to pass on without decay. See ATOMIC STRUCTURE AND SPECTRA; FINE STRUCTURE (SPECTRAL LINES). The polarized H− ions are formed in argon because its atoms are capable of donating electrons to metastable atoms but have a very weak capability of forming H− ions out of ground-state atoms. The ground-state charge-changing cross section appears to be lower by a factor of about 400; however, ground-state atoms outnumber the metastable atoms at this region by a factor of 40 so that the net polarization is 90%. The remainder of the apparatus consists of electric fields that accelerate and focus the beam so it can be accepted by an accelerator. The electron spins are polarized by applying a transverse field of about 100 kV/m while the atomic beam of metastables is passing along the axis of a solenoid at a field strength of about 57.5 milliteslas. The transverse electric field couples the 2s and 2p levels through the Stark effect, and the magnetic field is just sufficient to bring the levels with m j = −1/2 very close together in energy, while those with m j = +1/2 have their energy separation doubled, so that 2s atoms with m j = +1/2 are transmitted without loss. See STARK EFFECT; ZEEMAN EFFECT. There are several methods of going on to polarize the nuclei, including a device known as the spin filter. To produce the spin filter, a longitudinal electric field of about the same strength as the transverse field is added to the apparatus that polarizes the electrons, with the longitudinal field oscillating at about 1.60 GHz. If the magnetic field is adjusted so the Larmor frequency of the electron in the metastable atom is made equal to the oscillating electric field, then the lifetime of the atom for decay becomes very long exactly at resonance, yet short not far off resonance. The magnetic field that determines the Larmor frequency of the electron in the metastable atom is the sum of that due to the solenoid and that due to the proton aligned in the solenoidal field. These two fields have opposite signs in the case of mI = −1/2 ,
Ion transport and it is found that the two resonances for transmission are at 54.0 mT for mI = +1/2 and at 60.5 mT for mI = −1/2 . In the case of deuterons, there are three resonances, and they are well resolved even though mI = +1 is at 56.5 mT, 0 is at 57.5 mT, and −1 is at 58.5 mT. Joseph L. McKibben Colliding-beam source. A novel method of making polarized beams of negative hydrogen and deuterium is based on the direct conversion of polarized neutral hydrogen (H0) or deuterium (D0) atoms into polarized negative ions using the reaction below. H0 (polarized) + Cs0 → H− (polarized) + Cs+
Although conceptually very attractive, this idea presents some severe experimental difficulties. Undoubtedly the greatest of these is that if the polarized H0 or D0 atoms are produced at thermal energies, where production is greatly facilitated, the cross section or probability of the above reaction proceeding is extremely small. To circumvent this difficulty and to capitalize on a much higher cross section, it was proposed to accelerate a positively charged cesium (Cs) beam to an energy of about 40 keV, neutralize it by passing it through a canal containing cesium vapor, and allow this high-velocity beam of neutral cesium atoms to collide with the polarized atomic beam. Such a source has been built and demonstrated to yield 2.9 microamperes of polarized hydrogen negative ions and 3.1 µA of deuterium ions. These currents are about five times larger than those obtainable from the best Lamb-shift sources. Roy Middleton Bibliography. L. W. Anderson (ed.), Polarized Ion Sources and Polarized Gas Targets, 1994; I. G. Brown (ed.), The Physics and Technology of Ion Sources, 1989; A. Hershcoritz (ed.), International Symposiumon Electron Beam Ion Sources and Their Applications, 1989; R. Marrus (ed.), Physics of High-Ionized Atoms, 1989; J. Parker (ed.), International Conference on ECR Ion Sources, Michigan State University, C.P.47, 1987; G. Roy and P. Schmor (eds.), Polarized Proton Ion Sources, Conference Proceedings, TRIUMF, Vancouver, 1983.
Ion transport Movement of salts and other electrolytes in the form of ions from place to place within living systems. Animals Ion transport may occur by any of several different mechanisms: electrochemical diffusion, activetransport requiring energy, or bulk flow as in the flow of blood in the circulatory system of animals or the transpiration stream in the xylem tissue of plants. Sodium/potassium pump. The best-known system for transporting ions actively is the sodium/potassium (Na/K) exchange pump which occurs in plasma membranes of virtually all cells. The pump is especially concentrated in the cells of membranes such as frog skin, the tubules of vertebrate kidneys, salt
glands in certain marine reptiles and birds, and the gills of fresh-water fishes and crustaceans, which transport sodium ion (Na+) actively. The physical basis of the Na/K pump has been identified as a form of the enzyme adenosine triphosphatase (ATPase) which requires both Na+ and K+ for its activity of catalyzing the hydrolysis of adenosine triphosphate (ATP). This reaction provides the energy required for the extrusion of Na+ from cells in exchange for K+ taken in, but the mechanism of coupling of the energy liberation with the transport process has not been established. The ATP which provides energy for this and other processes of cellular work is formed by synthesis coupled with cellular oxidations. Again, the precise coupling mechanism linking oxidation with synthesis of ATP has not been established heretofore. Chemiosmotic hypothesis. The basis of the hypothesis was the concept that, if the enzyme system responsible for the transport process is embedded in the plasma membrane with an orientation such that the direction of the reaction catalyzed is parallel to the directional or vectorial orientation of the transport across the membrane, then the vectorial character of transport processes may be explained. Proton pumps. Although this hypothesis was not generally accepted, experimental studies revealed that many transport processes, such as in bacterial cells and in the mitochondria of eukaryotic cells, are associated with a transport of protons (hydrogen ions, H+). This fact led to the concept of proton pumps, in which the coupling or transfer of energy between oxidation processes and synthesis of ATP and between hydrolysis of ATP and transport or other cellular work is explained in terms of a flow of protons as the means of energy transfer. Electron transport system. The processes of oxidation in the citric acid cycle of reactions in mitochondria are known to be coupled with the synthesis of ATP, which is formed from adenosine diphosphate (ADP) and inorganic orthophosphate (Pi), through the system of enzymes and cytochromes known as the electron transfer chain or electron transport system. This system transports electrons, removed in dehydrogenation from the organic molecules of the citric acid cycle on one side of the mitochondrial membrane, to the site of their incorporation into water, formed from two hydrogen ions and an atom of oxygen on the other side of the membrane (Fig. 1). The flow of electrons from a relatively high potential level in the organic substrate to a level of lower potential in water constitutes, in effect, a current of negative electricity, and it was proposed that the flow drives a flow of protons in the opposite direction, as a current of positive electricity. This proton flow in turn is proposed as the force that drives the synthesis of three molecules of ATP for every two electrons flowing through the electron transport system. In effect, this is the machinery of the cellular power plant. Na/K ATPase pump. The Na/K ATPase pump (Fig. 2) then provides an example of a way in which a proton pump may transfer energy between the
427
Ion transport foodstuffs
CO2
citric acid cycle S
Key:
2e−
SH2 GH+
SH2
NADH +H+
flow of electrons flow of protons substrate in citric acid cycle
NAD NAD+ FPH2
2e
2H+ + ADP3−
ATP4−
−
flavoprotein
+Pi3−
FP 2Fe2+
2H +
428
cytochrome b 2Fe3+ 2Fe2+
−
2e
cytochrome c
2H+ + ADP3− + Pi3−
2Fe3+ 2Fe2+
ATP4−
cytochrome a 2H+ + ADP3− + Pi3− ATP4−
H2O
2Fe3+
2H+
+ 1⁄2O2
Fig. 1. Electron transport system as a proton pump.
ATP Na+ inside
inside
ATP
ATP
ADP + Pi + 2H+ K+
Na+ Na+
K+ outside
K+
outside
Na+
= phosphatide Fig. 2. Hypothetical model for the Na/K ATPase pump.
hydrolysis of ATP and a process of cellular work. The enzyme which is the basis of the pump is known to be bound to the lipid bilayer of the plasma membrane through phosphatides and to function only when so bound. The binding of Na+, K+, H+, and ATP to active sites on the enzyme presumably has an allosteric effect, changing the shape of the enzyme molecule, activating the hydrolysis of ATP, and opening pathways of exchange of Na+ and K+. If this exchange were to be accompanied by a flow of H+ as suggested in Fig. 2, the pump would be electrogenic,
contributing to the internal negativity characteristic of nearly all cells by a net outward transport of positive electricity. Bradley T. Scheer Plants Transport processes are involved in uptake and release of inorganic ions by plants and in distribution of ions within plants, and thus determine ionic relations of plants. Scope. Photoautotrophic plants require nutrition with inorganic ions for growth and maintenance,
Ion transport that is, for osmoregulation, for sustainment of structure, and for control of metabolism. Ion transport physiology is concerned with the mechanisms of movement along various pathways of transport on organizational levels differing in their degree of complexity. Furthermore, ion transport is the basic functional component of some physiological reactions in plants. In several cases, ion transport processes are the physiological basis for ecological adaptations of plants to particular environments. Organization levels. In unicellular, filamentous, or simple thalloid algae, in mosses, in poorly differentiated aquatic higher plants, and experimentally in cell suspension cultures of higher plants, ion transport can be considered on the cellular level. In these systems, all cells take up ions directly from the external medium. The cell wall and the external lipidprotein membrane (plasmalemma) have to be passed by the ions. Intracellular distribution and compartmentation are determined by transport across other membranes within the cells. The most important one is the tonoplast separating the cell vacuole from the cytoplasm. Within tissues the continuous cell walls of adjacent cells form an apoplastic pathway for ion transport. A symplastic pathway is constituted by the cytoplasm extending from cell to cell via small channels of about 40 nanometers diameter (plasmodesmata) crossing the cell walls. Transport over longer distances is important in organs (roots, shoots, leaves, fruits), which are composed of different kinds of tissues, and in the whole plant. Xylem and phloem serve as pathways for long-distance transport. Roots take up ions from the soil and must supply other plant organs. But there is also circulation within the plant; for example, potassium, an inorganic cation of major importance in plants, is readily transported both from root to shoot and in the opposite direction. A cooperation of roots and shoots is observed during reduction of sulfur and nitrogen. The major nutrient elements sulfur, nitrogen, and phosphorus are taken up in the form of the inorganic anions, sulfate, nitrate, and phosphate. Phosphate is used in metabolism in the oxidized form in which it is available to the plant. NO3− and SO42− must be reduced. The reduction capacity of roots can be limited. The anions are transported via the xylem to the leaves, where reduction proceeds using photosynthetic energy, reduction equivalents, and carbon skeletons for the attachment of reduced nitrogen and sulfur. Reduced nitrogen and sulfur compounds can be exported via the phloem. The nutritional status of roots and shoots regarding both inorganic anions and organic substrates plays a large role in regulation of ionic relations of whole plants. Phytohormones affect transport mechanisms; they are produced in particular tissues, are distributed via the transport pathways, and thus exert a signaling function. See PHLOEM; PLANT HORMONES; PLANT TISSUE SYSTEMS; XYLEM. Transport mechanisms. There are a variety of basic and special mechanisms for the different pathways of ion transport. Apoplastic and symplastic transport. The cell wall con-
tains amorphic hemicellulosic and pectic substances and proteins. These constituents provide fixed electrical charges for ion exchange, which are largely negative because of the carboxyl groups of pectic acids; positive charges arise from the amino groups of proteins. The cellulose micelles and fibrils of the cell wall have free spaces of up to 10 nm diameter between them, allowing transport in an aqueous phase. Thus transport in the cell wall, that is, the apoplastic space, is based on physical mechanisms and driving forces such as ion exchange, diffusion in solution, and flow of solution. After uptake via the plasmalemma, ion transport in the symplast is also maintained by physical driving forces. But metabolic energy is required for maintenance of the transport pathway and of concentration gradients. Cytoplasmic streaming can also play a role. See CELL WALLS (PLANT). Membrane transport. A major issue in ion transport is movement across lipid-protein membranes. It is governed by physical and metabolism-dependent driving forces. Diffusion of ions leads to a gradient of electrical potential and of ion concentration at the membrane. In plant cells, several mechanisms of primary active transport use metabolic energy (ATP or pyrophosphate, PPi) directly to pump protons or calcium ions across membranes. The proton pumps establish electrochemical gradients of protons across the membranes. The calcium pumps participate in keeping cytosolic Ca2+ concentrations at a low level, which is important for the role of Ca2+ as a secondary messenger regulating cellular functions. The proton gradients serve secondary active transport since they can drive transport of other ions and of nonelectrolyte solutes. In this way, most ion transport processes at membranes of plants can be explained. Catalyzers of transport or carriers are involved. Transport often shows saturation kinetics with increasing concentration of the transported solute. Transport in organs. Several mechanisms, corresponding to the various pathways, are involved in transport in organs. The most important example are roots (Fig. 3), although similar evaluations can be made for leaves or for specialized systems such as salt glands. Ions are absorbed from the soil by ion exchange at the root surface or by diffusion. The apoplastic pathway is blocked at a cell layer somewhat inside the root (the endodermis) where the cell walls are encrusted with an amorphic hydrophobic polymer (Casparian strips of endodermal cells). Thus, ions must cross the plasmalemma and pass the cytoplasm on their way to the xylem in the interior of the roots. Depending on the relative sizes of the various driving forces and resistances, both the apoplastic and the symplastic pathways can be used up to the endodermis. Two membrane transport processes involving metabolism-dependent ion pumping are involved: uptake from the medium or apoplast into the symplast at the plasmalemma in the root tissue peripheral to the endodermis; transport from the root cells lining the xylem vessels into these vessels. Long-distance transport in whole plant. The pipe system of the xylem in its mature transporting state is
429
430
Ion transport
endodermis
xylem vessels
root hair
Fig. 3. Scanning electron micrograph of the cross section of a sunflower root, about 1 mm in diameter.
composed of rows of dead cells (tracheids, tracheary elements) whose cross-walls are perforated or removed entirely. The driving force for long-distance transport in the xylem is very largely passive. Transport is caused by transpiration, the loss of water from the aerial parts of the plant, driven by the water potential gradient directed from soil to roots, leaves, and atmosphere. A normally much smaller component driving the ascent of sap in the xylem is osmotic root pressure due to the pumping mechanisms concentrating ions in the root xylem, with water following passively. This can be demonstrated by attaching capillaries to the cut ends of isolated roots; it is the reason for the phenomenon of guttation, the exit of little droplets of sap from small openings in the leaves (Fig. 4). Root pressure amounts to only a few atmospheres, and, taking the resistance of the transport pathway into account, it can only supply shoots of small plants.
Fig. 4. Guttation droplets at the leaf dents of lady’s-mantle (Alchemilla).
In a simplifying way the xylem can be considered as pathway for long-distance transport of ions from root to shoot, and the phloem for metabolite transport from photosynthesizing source leaves to various sinks in the plant. But inorganic ions can also move in the phloem, and this is important for recirculation in the plant. Only a limited number of ions are exempt from translocation in the phloem (such as Ca2+). The long-distance transport pathways of the phloem are the sieve tubes, pipe systems with porous structures in the cross-walls (sieve plates) but, in contrast to vessels of the xylem, having living cytoplasm. Concentration and pressure gradients built up by active loading and unloading of sieve tubes in the source and sink regions, respectively, are the driving forces for transport. Special adaptations. Ion transport mechanisms can provide the physiological basis for the adaptation to various environmental stress situations. For instance, iron-efficient dicotyledonous plants can increase the electron transport capacity at the plasmalemma of the cells on their root surfaces for reduction of Fe3+ to Fe2+ and hence for improved uptake of iron. Calcicole plants live only on calcareous soils, calcifuge plants exclusively on acid low-Ca2+ soils. These preferences are genetically fixed. There are also genotypic adaptations to high levels of various metal ions such as aluminum, zinc, copper, nickel, and lead. Particular transport properties of plants are of agricultural importance and thus interesting for breeders. There is also responsiveness to changes in the environment. In response to iron deficiency, some plants can thicken their root tissue behind the tips and form infoldings of the walls of peripheral root cells, increasing the plasmalemma surface for facilitation of transport. These structures disappear again as stress is released. The most important example of stress is salinity due to NaCl. Some plants are moderately resistant by excluding NaCl at the level of the roots. The uptake into the root has K+/Na+ selectivity; K+ can be taken up preferentially by exchanging for Na+, which is released from the root tissue. Na+ can also be reabsorbed from the xylem fluid along the pathway to the shoot, mostly in the upper root and lower shoot parts. At stronger salt stress, that is, in an environment of very low water potential, resistant plants must absorb the salt for osmoregulation. This causes problems because large Na+ levels are not compatible with the cytoplasmic machineries of metabolism. Thus, NaCl is sequestered in the vacuoles of the cells by transport across the tonoplast. To avoid shrinkage and dehydration, and maintain the water structures of the cytoplasm, compatible organic compounds are synthesized and accumulated in the cytoplasm, which often occupies only a few percent of the total cell volume of such rather succulent plants. Compatible solutes are, for example, polyalcohols, quaternary ammonia, and sulfonium compounds. See PLANTS OF SALINE ENVIRONMENTS. Special physiological mechanisms. Growing plants depend not only nutritionally on ion transport. Extension growth of young organs is driven by an
Ionic crystals H+−K+ exchange mechanism. The proton extrusion pump of cells is stimulated by the phytohormone β-indole acetic acid. K+ is taken up in exchange for H+ to provide electrical charge balance. But, without a mechanism regulating pH, the cytoplasm would become unduly alkaline. This is controlled by CO2 dark fixation giving malic acid via phosphoenolpyruvate and oxaloacetate. The protons of the newly formed carboxyl groups are extruded; the organic anion malate together with K+ is transported into the vacuoles, acting as osmotically active material building up turgor pressure, which drives cell extension. In this way, hydrogen ion pumps quite generally can be parts of pH-regulating mechanisms and turgor-dependent processes. Other examples are movements of stomatal guard cells and of pulvini bringing about nyctinastic and seismonastic changes of leaf positions. Turgor changes in the guard cells can operate via H+−K+ exchange and malic acid synthesis; increased turgor leads to opening of, and decreased turgor to closing of, stomata. Movements and other reactions in plants are accompanied by action potentials. The ionic basis for plant action potentials has been investigated in some algae (Characeae, Acetabularia). Depolarization and repolarization of the membranes during action potentials are due to efflux of Cl− followed by K+. In other cases, Ca2+ is also involved. See PLANT OF TRANSPORT SOLUTES; PLANT-WATER RELAUlrich Luttge ¨ TIONS. Bibliography. F. M. Harold, The 1978 Nobel Prize in chemistry, Science, 202(4373), December 15, 1978; N. Higinbotham, Electropotential of plant cells, Annu. Rev. Plant Physiol., 24:25–46, 1973; U. L¨ uttge and N. Higinbotham, Transport in Plants, 1979; U. L¨ uttge and M. G. Pitman (eds.), Transport in plants II, part A: Cells, part B: Organs and tissues, Encyclopedia of Plant Physiology, New Series, 1976; M. G. Pitman, Ion transport in the xylem, Annu. Rev. Plant Physiol., 28:71–88, 1977; L. Reinhold and A. Kaplan, Membrane transport of sugars and amino acids, Annu. Rev. Plant Physiol., 35:45–83, 1984.
Ionic crystals A class of crystals in which the lattice-site occupants are charged ions held together primarily by their electrostatic interaction. Such binding is called ionic binding. Empirically, ionic crystals are distinguished by strong absorption of infrared radiation, good ionic conductivity at high temperatures, and the existence of planes along which the crystals cleave easily. See CRYSTAL STRUCTURE. Compounds of strongly electropositive and strongly electronegative elements form solids which are ionic crystals, for example, the alkali halides, other monovalent metal halides, and the alkalineearth halides, oxides, and sulfides. Crystals in which some of the ions are complex, such as metal carbonates, metal nitrates, and ammonium salts, may also be classed as ionic crystals.
As a crystal type, ionic crystals are to be distinguished from other types such as molecular crystals, valence crystals, or metals. The ideal ionic crystal as defined is approached most closely by the alkali halides. Other crystals often classed as ionic have binding which is not exclusively ionic but includes a certain admixture of covalent binding. Thus the term ionic crystal refers to an idealization to which real crystals correspond to a greater or lesser degree, and crystals exist having characteristics of more than one crystal type. See CHEMICAL BONDING. Ionic crystals, especially alkali halides, have played a very prominent role in the development of solidstate physics. They are relatively easy to produce as large, quite pure, single crystals suitable for accurate and reproducible experimental investigations. In addition, they are relatively easy to subject to theoretical treatment since they have simple structures and are bound by the well-understood Coulomb force between the ions. This is in contrast to metals and covalent crystals, which are bound by more complicated forces, and to molecular crystals, which either have complicated structures or are difficult to produce as single crystals. Being readily available and among the simplest known solids, they have thus been a frequent and profitable meeting place between theory and experiment. These same features of ionic crystals have made them attractive as host crystals for the study of crystal defects: deliberately introduced impurities, vacancies, interstitials, and color centers. See COLOR CENTERS; CRYSTAL DEFECTS. Crystal structure. The simplest ionic crystal structures are those of the alkali halides. At standard
Fig. 1. Sodium chloride lattice. The darker circles represent positive ions and the lighter circles negative ions. (After F. Seitz, The Modern Theory of Solids, Dover, 1987)
Fig. 2. Cesium chloride lattice. (After F. Seitz, The Modern Theory of Solids, Dover, 1987)
431
432
Ionic crystals
Fig. 3. Wurtzite lattice. (After F. Seitz, The Modern Theory of Solids, Dover, 1987)
Fig. 4. Zinc blende lattice. (After F. Seitz, The Modern Theory of Solids, Dover, 1987)
temperature and pressure the 16 salts of lithium (Li), sodium (Na), potassium (K), and rubidium (Rb) with fluorine (F), chlorine (Cl), bromine (Br), and iodine (I), have the sodium chloride structure of interpenetrating face-centered cubic lattices (Fig. 1). Cesium fluoride (CsF) also has this structure but otherwise the cesium halides have the cesium chloride structure of interpenetrating simple cubic lattices (Fig. 2). The sodium chloride structure is also assumed by the alkaline-earth oxides, sulfides, and selenides other than those of beryllium (Be) and by silver fluoride (AgF), silver chloride (AgCl), and silver bromide (AgBr). Other crystal structures, such as the wurtzite structure (Fig. 3) assumed by beryllium oxide (BeO), β-zinc sulfide (β-ZnS, also known as wurtzite), and zinc oxide (ZnO) and the zinc blende structure (Fig. 4) assumed by copper(I) chloride (CuCl), copper(I) bromide (CuBr), copper(I) iodide (CuI), beryllium sulfide (BeS), and α-zinc sulfide (α-ZnS, also known as sphalerite or zinc blende) are also typical of the ionic crystals of salts in which the atoms have equal positive and negative valence. Ionic compounds consisting of monovalent with divalent elements crystallize typically in the fluorite structure (Fig. 5) assumed by calcium fluoride (CaF2), barium fluoride (BaF2), cadmium fluoride (CdF2), lithium oxide (Li2O), lithium sulfide (I2S), sodium monosulfide (Na2S), copper(I) sulfide (Cu2S), and copper(I) selenide or the rutile structure (Fig. 6) assumed by titanium(IV) oxide (TiO2, also known as rutile), zinc fluoride (ZnF), and magnesium fluoride (MgF2). Cohesive energy. It is possible to understand many of the properties of ionic crystals on the basis of a
simple model originally proposed by M. Born and E. Madelung. In the simplest form of this model, the lattice sites are occupied by spherically symmetric ions having charges corresponding to their normal chemical valence. These ions overlap only slightly the ions at neighboring sites and interact with one another through central forces. In sodium chloride (NaCl), for example, the spherically symmetric closed shell configurations which the free Na+ and Cl− ions possess are considered to be negligibly altered by the crystalline environment and to have charges +e and −e, respectively, where −e is the charge on the electron. Using this model, together with certain assumptions about the forces between the ions, Born and M. G¨ oppert-Mayer calculated the cohesive energy of a number of ionic crystals. This cohesive energy is defined as the energy necessary to take an ionic crystal from an initial stage, in which the crystal is at 0 K (−459.67◦F) and zero pressure. While it cannot be measured directly, this cohesive energy can be deduced from experimental quantities by the use of the Born-Haber cycle. Thus the validity of the simple model of Born and Madelung can be tested by comparing the calculated cohesive energy of Born and Mayer with values which have been experimentally determined. See COHESION (PHYSICS). Born-Haber cycle. This is a sequence of processes leading from the initial to the final state specified in the definition of the cohesive energy. Because in most of the processes in this cycle heat changes at constant pressure are measured, it is convenient to consider the change in heat content or enthalpy
Key:
F
Ca
Fig. 5. Calcium fluoride lattice. (After F. Seitz, The Modern Theory of Solids, Dover, 1987)
Fig. 6. Rutile lattice. (After F. Seitz, The Modern Theory of Solids, Dover, 1987)
Ionic crystals H = U + PV, where P is the pressure, V is the volume, and U is the cohesive energy, rather than the change in E in each step. Since the total change in H is independent of the intermediate steps taken to accomplish it, the H for the change in state specified in the definition of cohesive energy will be given by the sum of the H in the steps of the cycle. Furthermore, because when P = 0, H = U, the H thus calculated will be the cohesive energy. In the following enumeration of the steps in the Born-Haber cycle [A] indicates element A in a monatomic gaseous state and [A] indicates A in a solid state, and so on. The B without brackets in step 3 refers to the natural form of B at the given temperature and pressure. The ionic compound is taken to be AB, where A is the electropositive and B the electronegative element. The steps of the Born-Haber cycle (all temperatures in K and pressures in atmospheres) are: ◦ ◦ 0 0 1.[AB]P=1 → [AB]P=1 . The value of H1 in this isothermal compression is very small and can be neglected in comparison with other heat content changes in the cycle. ◦ ◦ 0 2. [AB]P=1 → [AB]298 P=1. In this step the crystal is warmed to room temperature. The value of H2 can be calculated from the specific heat at constant pressure for the crystal. ◦ ◦ ◦ 0 298 3. [AB]P=1 → [A]298 P=1 + BP=1. The value of H3 is given by the heat of formation of the compound AB which is referred to substances in their natural forms at standard temperature and pressure. ◦ ◦ 298 4. B298 P=1 → (B)P=1. The value of H4 is the dissociation energy necessary to form a monatomic gas from B in its natural state at standard temperature and pressure. For chlorides, for example, this is the dissociation energy of a Cl2 molecule into Cl atoms. ◦ ◦ 298 5. [A]298 P=1 → (A)P=1. In this step, H5 is the heat of sublimation of the metal A. It can be deduced from the heat of fusion, the specific heats of the solid, liquid, and gaseous phases, and the vapor pressure data for the metal. ◦ ◦ ◦ ◦ 0 298 0 6. (A)298 P=1 → (A)P=0; (B)P=1 → (B)P=0. An adiabatic expansion of the gases, considered as ideal, to a very large volume results in a state in which P = 0, T = 0, and H6 = −5/√2 RT/mole, where R is the gas constant. ◦ ◦ 0 7. (A)P=0 → (A+)0P =0 + e−. The ionization of the A atoms gives a H7 per atom equal to their first ionization energy. ◦ ◦ 0 0 8. (B)P=0 + e− → (B−)P=0 . The electrons from step (7) are placed on the B atoms. The value of H8 per atom is given by the electron affinity of the B atom. See ELECTRONEGATIVITY; SUBLIMATION. As an example, for sodium chloride H1 ∼ = 10−4 (kilocalories/mole), H2 = 2.4, H3 = 98.3, H4 = 26.0, H5 = 28.8, H6 = −2.9, H7 = 11.9, and H8 = −80.5. Experimental cohesive energies for a number of the ionic crystals are given in the table. See STRUCTURAL CHEMISTRY. Born-Mayer equation. By use of the Born-Madelung model, the cohesive energy of an ionic crystal can be related to its measured compressibility and lattice spacing. Because of the opposite signs of electric charges which they carry, the unlike ions in
Cohesive energies∗
Crystal
Structure
LiCl LiBr Lil NaCl NaBr Nal KCl KBr Kl RbCl RbBr Rbl CaF2 CuCl ZnS PbO2 AgCl
NaCl NaCl NaCl NaCl NaCl NaCl NaCl NaCl NaCl NaCl NaCl NaCl Fluorite Zinc blende Wurtzite Rutile NaCl
Ucalc Uexp , Ucalc , refined, † † kcal/mole kcal/mole kcal/mole† 201.5 191.5 180.0 184.7 175.9 166.3 167.8 161.2 152.8 163.6 159.0 149.7 618.0 226.3 851 2831 207.3
196.3 184.4 169.1 182.0 172.7 159.3 165.7 158.3 148.2 159.1 151.9 143.1 617.7 206.1 816 2620 187.3
200.2 189.5 176.1 183.5 175.5 164.3 167.9 161.3 152.4 162.0 156.1 148.0
∗ The cohesive energies in the last two columns are calculated using the Born-Mayer equation and the refined Born-Mayer theory, respectively. The refined calculations for the last five crystals have not been made. † 1 kcal = 4.184 kJ.
such a crystal model attract one another according to Coulomb’s law. However, such a charge distribution cannot be in equilibrium if only Coulomb forces act. In addition to their Coulomb interaction, the ions exhibit a repulsion which, compared with the Coulomb interaction, varies rapidly with interionic separation. The repulsion becomes strong for small separations and diminishes rapidly for increasing separation. The static equilibrium configuration of the crystal is determined by a balance of these forces of attraction and repulsion. See COULOMB’S LAW. The short-range repulsion between ions must be described by quantum mechanics. When the electron orbits of two ions overlap, the electron charge density in the region of overlap is diminished as a consequence of the Pauli exclusion principle. This charge redistribution results in a repulsion between the ions in addition to the Coulomb interaction, which they have at all interionic distances. In early work, the energy Vrep due to repulsion of two ions at a distance r was assumed to have the form of Eq. (1), where B and n are constants to be deterVrep =
B rn
(1)
mined. Quantum-mechanical calculations of the interaction of atoms with closed shells of electrons indicate that the interaction of repulsion is better approximated by an exponential dependence on interionic distance given in Eq. (2), where A and ρ are conV rep = Ae−r/ρ
(2)
stants. Both forms for Vrep give almost the same calculated cohesive energy; the exponential form gives slightly better agreement with experiment. See EXCLUSION PRINCIPLE. Using the exponential form for the repulsive interaction energy, the potential energy ϕ(rij) of a pair
433
434
Ionic crystals of ions i and j can be written as Eq. (3), where Zie ϕ(rij) =
2
Zi Z je + Ae−rij/ρ rij
(3)
and Zje are the net charges of the ions i and j, and rij is the distance of separation of their centers. The assumption that the ions are spherically symmetric has been used here in writing the Coulomb interaction as that of point charges. The cohesive energy U of an ionic crystal due to the Coulomb and repulsive interactions of its ions is the sum taken over all pairs of ions in the crystal as in Eq. (4), where in the summation the lattice site U=
1 ϕ(rij) 2 i,j
(4)
indices, i and j, range over all sites of the crystal. The prime on the summation sign indicates the exclusion from the sum of terms for which i = j and the factor of 1/2 avoids counting pairs of ions twice. For crystals in which there are only two types of ion, the Coulomb or electrostatic part of U, Ue, can be written in a simple form given in Eq. (5), where Ue =
1 Zi Z je2 NαM (Z+ Z− )e2 =− 2 ij rij r
(5)
+ Z+e and −Z−e are the charges of the positive and negative ions, N is the number of ion pairs in the crystal, r is the nearest anion-cation separation, and α M is the Madelung constant. See MADELUNG CONSTANT. By anticipating that ρ will be small compared to the nearest neighbor separation, the interactions of repulsion may be neglected for pairs of ions other than nearest neighbors. The energy of the crystal model for arbitrary nearest neighbor separation r is then given by Eq. (6), where M is the number of U(r) = N
−αM Z+ Z− e2 + MAe−r/ρ r
(6)
nearest neighbors which each ion has in the crystal. The parameter ρ may be evaluated for a given crystal by requiring that (1) U be a minimum for the observed value of r and (2) that the compressibility of the model equal the measured compressibility of the crystal. It follows from these requirements that Eq. (7) holds. This is the Born-Mayer equation for U(r0 ) =
NαM Z+ Z− (1 − ρ/r0 )e2 r0
(7)
the cohesive energy, where r0 refers to the nearest neighbor distance at static equilibrium. Further, in this equation, ρ is given in terms of experimental quantities by Eq. (8), where K is the measured compressibility of the crystal. r0 18r04 = +2 ρ αM e2 K
(8)
Cohesive energies for some alkali halides and crystals of other structures calculated in this way are shown in the table, where they can be compared
with the experimental values for the cohesive energy. The agreement is considered to be support for the essential validity of the Born-Madelung model. The model has been applied with some success even to the ammonium halides, assuming spherically symmetric ions. The Born-Mayer theory has been refined, with resulting improvement in the agreement between the calculated and experimental cohesive energies for alkali halides. The refinements have considered the small (a few kilocalories per mole or less) corrections to the cohesive energy arising from van der Waals interactions and zero-point vibrational energy. The van der Waals forces are weak attractive forces between ions due to mutually induced fluctuating dipoles. Similar forces, even weaker, due to dipole-quadrupole interactions have also been considered. Both these interactions make small positive contributions to the cohesive energy. At 0 K (−459.67◦F) the lattice is not in static equilibrium but as a consequence of quantum mechanics is in a state of zero-point vibration with nonzero energy. The energy of these vibrational modes cannot be further reduced. The zero-point vibration energy gives a small negative contribution to the cohesive energy. The results of these refinements of the BornMayer theory are also shown in the table. See LATTICE VIBRATIONS. While it has had success in enabling the cohesive energy to be calculated, the shortcomings of this simple model become evident in its failure to predict correctly the elastic shear constants of ionic crystals. This requires interionic forces of a noncentral character, which are absent from the model. There are also other instances in which the simple model is found to be inadequate. More elaborate models which take into account features absent from the Born-Mayer model and which may be regarded as extensions of it have had considerable success in accounting for the elastic and dielectric properties of alkali halides. More important has been the ability of these models to account for the lattice phonons in these crystals. This has given the alkali halides a prominent place in the study of phonons in insulators, where they are among the few reasonably well-understood solids. B. Gale Dick Ionic conductivity. Most ionic crystals have large band gaps, and are therefore generally good electronic insulators. However, electrical conduction occurs by the motion of ions through these crystals. The presence of point defects, that is, deviations from ideal order in the crystalline lattice, facilitates this motion, thus giving rise to transport of electric charge. In an otherwise perfect lattice where all lattice sites are fully occupied, ions cannot be mobile. Mechanism. The two most common types of point defects that give rise to conductivity in ionic crystals are vacancies and interstitials. These intrinsic defects are in thermodynamic equilibrium, and their concentrations depend exponentially upon the temperature. The most common mechanisms by which
Ionic crystals ions are conducted are vacancy diffusion and interstitial diffusion. In the former, a mobile ion positioned close to a vacancy may move into it. In the latter, the interstitial ion can change sites by hopping. Ionic crystals with Frenkel disorders (cation vacancycation interstitial pairs), such as silver halides, where the predominant defects are silver interstitials, have substantially greater values of ionic conductivity than those with Schottky disorders (anion-cation vacancy pairs), such as alkali halides and alkaline-earth halides, where the predominant defects are vacancies. Vacancies and interstitials can also be generated extrinsically by doping the ionic crystal with a suitable amount of an aliovalent (that is, of different valence from the host ion) species. This leads to the formation of vacancies or interstitial defects to counterbalance the excess charge brought in by the aliovalent ion since the crystal must preserve its charge neutrality. Superionic conductors. Many so-called normal ionic crystals possess conductivities of about 10−10 (ohm·cm)−1 or lower at room temperature. However, a relatively small number of ionic materials, called superionic conductors or fast ionic conductors, display conductivities of the order of 10−1 to 10−2 (ohm · cm)−1, which imply ionic liquidlike behavior. In most of these crystals, only one kind of ionic species is mobile, and its diffusion coefficient and mobility attain values such as found otherwise only in liquids. Due to their high value of ionic conductivity as well as their ability to selectively transport ionic species, superionic conductors have successfully been employed as solid electrolytes in many applications. See DIFFUSION; SOLID-STATE BATTERY. The high value of conductivity and diffusion coefficient for the mobile ionic species is due to the high degree of structural disorder in these materials. This disorder is so great that the mobile ions assume a quasimolten state, swimming in the open spaces in a rigid framework of the immobile ions. For example, the high-temperature α-phase of silver iodide (α-AgI), stable above 147◦C (297◦F), attains unusually high values of ionic conductivity, in excess of 1 (ohm · cm)−1. The iodide ions occupy the body-centered cubic (bcc) sites and form the rigid framework. The silver ions are distributed among sites whose numbers are much greater than the number of silver ions. Similarly, RbAg4I5 has 56 tetrahedral interstitial positions for the 16 silver ions present in the structure to occupy. [This material possesses the highest ionic conductivity yet reported for a solid at room temperature, 0.3 (ohm · cm)−1 at 300 K or 80◦F.] The preexponential factor in the Arrhenius expression for the temperature dependence of the conductivity is appreciably smaller in superionic conductors than that exhibited by normal ionic materials. The activation enthalpy for conduction is generally small in superionic conductors, but can range over a wide spectrum. For example, doped zirconium dioxide (ZrO2), which transports oxide ions by vacancy
diffusion and is used at elevated temperatures between 600 and 1400◦C (1100 and 2500◦F), possesses a high activation enthalpy for ionic conduction. The oxygen vacancies in the lattice are produced by dissolution of dopants such as calcium oxide (CaO), magnesium oxide (MgO), or yttrium oxide (Y2O3) in zirconium dioxide. Since the Ca2+ ion substitutes for the Zr4+ site, one oxygen vacancy is produced for every Ca2+ ion. The typical amount of doping in this material is of the order of 10 mol %. This results in a very high concentration of oxygen vacancies of about 1022 cm−3, comparable to the electron density in metals. See ENTHALPY. Parameters and measurements. If charge and mass are transported by the same mobile species, then independent measurements of the ionic conductivity and the diffusion coefficient allow calculation of the concentration as well as the electrical mobility of that species. Even the most mobile ions in superionic conductors have mobilities that are orders of magnitude lower than that of electrons in metals. Hence the ionic conductivity of superionic conductors is still much below that of metals. An important consideration is the extent to which a mobile ionic species contributes to the total conduction process. The transference number ti for a species i is defined as the ratio of the partial (or specific) conductivity due to species i to the total conductivity. It can be determined by measuring the open-circuit potential across the ionic crystal in an electrochemical cell arrangement, such that a known difference in the chemical potential of species i is imposed on each side of the crystal at a suitable temperature. The ratio of the measured open-circuit potential to the theoretical potential calculated from the known chemical potential difference gives the transference number. Another method is to use an electrochemical titration technique with suitable slab electrodes that contain the mobile species. With passage of current, the mobile species i is tranported across the ionic material. The weight change in the electrode material due to deposition or depletion of i is compared with the expected value calculated from Faraday’s law. However, in all these measurements the interface between the electrode and the ionic material is very important. See CHEMICAL THERMODYNAMICS; ELECTROCHEMICAL TECHNIQUES; ELECTROLYSIS. In order to measure the ionic conductivity or the transference number of the mobile species i in an ionic material, an electrode material that contains species i is required to obtain an interface with the ionic material that is completely nonblocking (sometimes called reversible) to species i but completely blocking (that is, nonreversible) to all other species. By using suitable electrode materials the conductivities and the transference numbers of the minority carriers as well as the mobile species can be obtained. The easiest way to measure ionic conductivity is a two-probe dc technique with an electrochemical cell arrangement that has nonblocking interfaces for the species i on each side of the ionic material. In
435
436
Ionic equilibrium practice, there may be problems with slow electrode reactions that must be isolated and accounted for. A better way is the four-probe dc technique using four nonblocking electrodes, two as voltage probes and the other pair to pass a known and constant current. If a suitable nonblocking electrode material is not available, ac techniques may be used. A smallamplitude sinusoidal signal at reasonably high frequencies can be applied to the ionic material in an electrochemical cell arrangement, and the conductivity value extracted from the real part of the measured cell impedance. However, if the frequency used in the measurements is not high enough, the conductivity value may be in error. A more involved technique to eliminate this problem is ac impedance spectroscopy. Here a wide range of frequencies covering many decades is imposed upon the ionic material, and the frequency dispersion is analyzed in the complex impedance plane. Experience and insight are required for meaningful analyses of the frequency spectrum. See ALTERNATING-CURRENT CIRCUIT THEORY. Turgut M. Gur ¨ Dielectric properties. The polarization of an ionic crystal in an applied time-varying electric field depends on the frequency v of the field. If v = 0 the field is static, and a static equilibrium will be achieved in which the ions are displaced from their lattice sites a distance determined by a balance between the force of repulsion due to their nearest neighbors and the electrical force on the ions. Oppositely charged ions move in opposite directions with a resulting ionic polarization. In addition, the electron clouds of the ions are deformed in the local electric field at the ions and, to a much lesser extent, by the repulsive forces, to give an electronic polarization. As the frequency v increases, inertial effects must be considered. When v is sufficiently high, that is, when v > v0 (v0 is the natural frequency of vibration of ions about lattice sites), the heavy ions can no longer follow the rapid variations of the applied field. The ionic polarization ceases to contribute to the total polarization, although the less massive electron clouds continue to give an electronic polarization. Eventually, with increasing frequency, v exceeds the frequency at which the electron cloud deformations can follow the field, and the electronic polarization becomes altered. The frequency v0, called the restrahlung or residual-ray frequency, lies in the infrared region, and the associated vibrations of the charged ions are responsible for the strong absorption of infrared radiation in ionic crystals. The term residual-ray frequency comes from the fact that the reflectivity of visible and infrared radiation is highest for radiation of a frequency near to that of maximum absorption, v0. Thus multiple reflection will yield radiation composed predominantly of residual rays of frequency near to v0. For additional information on residual rays see REFLECTION OF ELECTROMAGNETIC RADIATION
At frequencies below v0 and above it up to the ultraviolet, an ideal alkali halide is transparent to electromagnetic radiation. Impurities or defects often in-
troduce absorption bands into these extensive and otherwise transparent regions and can thus be studied optically. This fact accounts for much of the wealth of experimental detail available on defects and impurities in alkali halides and for the interest which they have attracted. Ionic radii. It has been found that ionic radii can be chosen such that the lattice spacings of ionic crystals are given approximately by the sum of the radii of the constituent ions. Thus, for instance, eight radii suffice to give the approximate lattice spacings of the 16 alkali halides having sodium chloride structure. This is true because of the rapid variation of the repulsive interaction with distance, so that the ions interact somewhat like hard spheres. This implies that a given ion, having the same ionic radius in a variety of crystals, has an electron cloud that is only slightly altered by the differing crystalline environments of these crystals. B. Gale Dick Bibliography. N. W. Ashcroft and W. D. Mermin, Solid State Physics, 1976; H. Boettger et al., Hopping Conduction in Solids, 1986; S. Chandra, Superionic Solids: Principles and Applications, 1981; C. Kittel, Introduction to Solid State Physics, 7th ed., 1996; L. C. Pauling, The Nature of the Chemical Bond, 3d ed., 1960; F. Seitz et al., Solid State Physics, 40 vols. and 15 suppls., 1955–1987; A. Stoneham (ed.), Ionic Solids at High Temperatures, 1989; T. Takahashi (ed.), High Conductivity in Solid Ionic Conductors: Recent Trends and Applications, 1989.
Ionic equilibrium An equilibrium in a chemical reaction in which at least one ionic species is produced, consumed, or changed from one medium to another. Types of equilibrium. A few examples can illustrate the wide variety of types of ionic equilibrium which are known. Dissolution of an un-ionized substance. The dissolution of hydrogen chloride (a gas) in water (an ionizing solvent) can be used to illustrate this type. Reactions (1), (2), and (3) all represent exactly the same equi+ − HCl(g) H + Cl
(1)
+ − HCl(g) + H2 O H3 O + Cl
(2)
+ − HCl(g)+4H2 O H9 O4 + Cl
(3)
librium. Equation (1) ignores the hydration of the proton and is preferred for many purposes when the hydration (or solvation) of the proton is irrelevant to a particular discussion. Equation (2) is written in recognition of the widely held belief that free protons do not exist in aqueous solution. Equation (3) indicates that another three molecules of water are very firmly bound to the H3O+ ion (the hydronium ion). There is no implication in Eq. (3), however, that the total number of molecules of water attached to, or weakly affected by, the hydronium ion may not be considerably larger than three.
Ionic liquids Not much is known about the solvation of ions, although it has been proved that each chromic ion, Cr3+, in dilute aqueous solution holds at least six water molecules. Only a few other similar data have been clearly established. Hence equations for the other examples cited below are written without regard to solvation, except that the hydrogen ion is usually written in accordance with common practice as H3O+. See ACID AND BASE. Dissolution of a crystal in water. The dissociation of solid silver chloride, reaction (4), illustrates this type of + − AgCl(crystal) Ag + Cl
(4)
equilibrium. See SOLUBILITY PRODUCT CONSTANT. Dissociation of a strong acid. Nitric acid, HNO3, dissociates as it dissolves in water, as in reaction (5). At + − HNO3 + H2 O H3 O + NO3
(5)
77◦F (25◦C) about one-half the acid is dissociated in a solution containing 10 (stoichiometric) moles of nitric acid per liter. Dissociation of an ion in water. The bisulfate ion, HSO− 4, dissociates in water, as in reaction (6). About one+ 2− HSO4 − + H2 O H3 O + SO4
(6)
half the HSO4− is dissociated in an aqueous solution containing about 0.011 mole of sulfuric acid per liter at 77◦F (25◦C). ◦ Dissociation of water itself. In pure water at 77 F ◦ (25 C) the concentration of each ion is about 10−7 mole/liter, but increases rapidly as temperature is increased. This equilibrium is represented by reaction (7). + − 2H2 O H3 O + OH
(7)
Formation of a complex ion. In water or in a mixture of fused (chloride) salts, complex ions, such as ZnCl42−, may be formed, as in reaction (8). 2− Zn2+ + 4Cl− ZnCl4
(8)
See COORDINATION COMPLEXES. Dissociation of a weak acid. In water acetic acid dissociates to form hydrogen (hydronium) ion and acetate ion, as in reaction (9). + − CH3 CO2 H + H2 O H3 O + CH3 CO2
(9)
Electrochemical reaction. Reaction (10) takes place “al+ − /2 H2 (g)+AgCl(s)+H2 O(l) H3 O + Cl + Ag(s)
1
(10)
most reversibly” when the equilibrium shown exists. A small current is allowed to flow through an electric cell consisting of an aqueous solution of HCl saturated with silver chloride, a hydrogen electrode, and a silver electrode. Saturation is maintained by an excess of solid silver chloride which for convenience is sometimes mixed with the silver or plated, as a coating, on the metal. The electrode is then called a silver–silver chloride electrode.
Many additional types of equilibria could be mentioned, including those reactions occurring entirely in the gaseous phase and those reactions occurring between substances dissolved in two immiscible liquids. Quantitative relationships. Each reaction obeys an equilibrium equation of the type shown as Eq. (11). [H+ ][Cl− ] f+ f− = Qc Qf = K [HCl(g)] γg
(11)
The activity coefficient γ g can be ignored here because it is very nearly unity. The terms f+ and f− are the respective activity coefficients of H+ and Cl− but cannot be determined separately. Their product can be determined experimentally and can also be calculated theoretically for very dilute solutions by means of the Debye-H¨ uckel theory of interionic attraction. Because γ g and f+ f− are nearly unity, Eq. (11) demands that the pressure of HCl gas above a dilute aqueous solution be proportional to the square of the concentration of the solute. Qc is called the concentration quotient and Qf the quotient of activity coefficients. Similarly, the dissociation of acetic acid obeys Eq. (12), where fu is the activity coefficient of the [H+ ][CH3 CO− 2 ] f+ f− = Qc Qf = K [CH3 CO2 H] fu
(12)
unionized acetic acid. Early work on electrolytes revealed that Qc, the concentration quotient, was constant within the limits of accuracy attainable at the time. Later work revealed that the measured concentration quotient Qc first increases as concentration is increased from very small values and then decreases sharply. The initial increase is due largely to the electrical forces between the ions which reduce the product f+f−. There is some evidence that the subsequent decrease in Qc and the concomitant rise in Qf are due to removal of some of the monomeric acetic acid by the formation of dimeric acetic acid, (CH3CO2H)2. The fact is, however, that knowledge concerning activity coefficients in solutions other than very dilute ones is not yet understood. Even the experimental methods for the measurement of the molecular species involved in some equilibria were not evolved until recently. See ELECTROLYTIC CONDUCTANCE; HYDROLYSIS. Thomas F. Young Bibliography. G. M. Barrow, Physical Chemistry, 6th ed., 1996; H. Russotti, The Study of Ionic Equilibria: An Introduction, 1978; G. H. Schenk and D. D. Ebbing, Qualitative Analysis and Ionic Equilibrium, 2d ed., 1990.
Ionic liquids In the simplest sense, the liquid phase that forms on the melting of an ionic compound (that is, a salt) and consists entirely of discrete ions (Fig. 1). More specifically, ionic liquids are the liquid phase of organic salts (organic, complex cations) which
437
438
Ionic liquids
800°C
(a)
a
c
C5
C2 C7 64°C
C6 C9
N3
N1 C4
C8
Cl
C10
d b
(b) Fig. 1. Examples of (a) inorganic and high-melting salts and (b) organic and low-melting salts known as ionic liquids. (After J. D. Holbrey et al., Crystal polymorphism in 1-butyl-3 methylimidazolium halides: Supporting ionic liquid formation by inhibition of crystallization, Chem. Commun., 2003:1636–1637, 2003)
melt at, or near, ambient temperature (below 100– 150◦C; 212–302◦F). Ionic liquids chemically resemble both phase-transfer catalysts and surfactants and, in some cases, have been used for both purposes. See PHASE-TRANSFER CATALYSIS; SALT (CHEMISTRY); SURFACTANT. The most extensively studied ionic liquids are systems derived from 1,3-dialkylimidazolium (structure 1), tetraalkylammonium (2), tetraalkylphosphonium (3), and N-alkylpyridinium (4) cations. R R
R′
N䊝N
N+ R
R R
(1)
(2)
R
R
P+ R
N R
R
䊝
(3)
(4)
Derivatives made from these cations can increase or decrease functionality, such as alkyl chains, branching, or chirality. Common anions that allow for
the formation of low-melting ionic liquids typically have diffuse (delocalized) charge and can range from simple inorganic anions such as chloride (Cl−), bromide (Br−), and iodide (I−), through larger pseudospherical polyatomic anions, including hexafluorophosphate [PF6]− and tetrafluoroborate [BF4]−, to larger, flexible fluorinated anions such as bis(trifluoromethylsulfonamide) [(CF3SO2)N]− and tri(perfluoroX) trifluorophosphate. A wide range of simple polyhalometallate and halometallate/halide complex anion ionic liquids have also been extensively investigated and have important application in electrochemistry and electrodisposition. The anionic component of the ionic liquid typically controls the solvent’s reactivity with water, coordinating ability, and hydrophobicity. Anions can also contain chiral components or can be catalytically active, such as carboranes, polytungstates, and tetrachloroaluminate anions. Manipulation of the rheological properties through mixing of cation-anion pairs to obtain materials that support or enhance reactions makes the use of ionic liquids in organic synthesis an intriguing possibility. Synthesis. Preparation of the cationic portion of an ionic liquid can be achieved through the quaternization of phosphines or amines with a haloalkane or through protonation with a free acid. Quaternization is generally regarded as the
Ionic liquids
Fig. 2. X-ray crystal structure of 1-butyl-3-methylimidazolium chloride ([C4mim]Cl) in its ortho polymorph (its most common conformation). (After J. D. Holbrey et al., Crystal polymorphism in 1-butyl-3 methylimidazolium halides: Supporting ionic liquid formation by inhibition of crystallization, Chem. Commun., 2003:1636–1637, 2003. Reproduced by permission of The Royal Society of Chemistry.)
more sound approach because cations prepared through protonation reactions can be degraded easily through deprotonation, leading to the breakdown of the solvent. Reaction time and temperature for typical quaternization reactions depend upon both the haloalkane and the cation “backbone” used. The most widely used ionic liquids in research include 1-butyl-3-methylimidazolium chloride [C4mim]Cl, [C4mim][PF6], and [C4mim][BF4] (Fig. 2). Ionic liquids prepared using this method can be modified through anion exchange reactions, acid treatment, or metathesis reactions to prepare ionic liquids containing a desired anion. (The choice of anion is important, as it is the most influential factor on the physical properties of an ionic liquid.) Through these basic reactions, it is possible to prepare a wide range of ionic liquids possessing varied physical properties that can be used to aid in creating optimal reaction conditions. See QUATERNARY AMMONIUM SALTS. Purification of ionic liquids is of utmost importance, specifically those used in organic synthesis, as impurities from preparation could adversely affect the reactions. To synthesize pure ionic liquids, it is necessary to begin with purified starting materials. However, the use of pure starting materials does not ensure pure product. Common impurities in ionic liquids include halide anions from metathesis reactions, the presence of color and water, and acid impurities. Dichloromethane extraction is the most widely accepted method for removing excess halide anions from ionic liquids. Since ionic liquids typically have no measurable vapor pressure, dichloromethane can be removed easily under vacuum. As with organic solvents, water can also be
removed from an ionic liquid under vacuum with heat to drive off unwanted moisture. Ionic liquids normally exist as colorless solvents. Impurities that can cause an ionic liquid to become colored can be avoided by using pure starting materials, avoiding the use of acetone in the cleaning of glassware, and keeping temperatures as low as possible during the quaternization step of its synthesis. Properties. Interest in ionic liquids can be attributed to a number a unique qualities inherent to these materials. First, ionic liquids are excellent conducting materials that can be used in a number of electrochemical applications, including battery production and metal deposition. In fact, they were first designed to be used in electrochemical applications. Second, ionic liquids typically have no measurable vapor pressure, making them an attractive replacement for volatile organic solvents in synthesis. Third, ionic liquids are composed of ions that can be varied to create materials with vastly different physical and chemical properties, compared with conventional solvents. The ability to “fine-tune” the solvent properties of ionic liquids has been exploited to create solvents ideal for a range of organic syntheses. Lastly, ionic liquids exhibit varying degrees of solvation and solubility in a range of organic solvents, allowing simple separations and extractions. See SOLVENT. Use in organic reactions. The replacement of volatile organic solvents with ionic liquids in organic reactions is very exciting. Ionic liquids are polar solvents which can be used as both solvents and reagents in organic synthesis, and the possibility exists that ionic liquids may positively affect the outcome of the reactions. Certain ionic liquids are suitable for particular classes of organic reactions. For example, neutral ionic liquids are commonly used in DielsAlder and condensation reactions, as well as nucleophilic displacement. Ionic liquids that possess Lewis acid properties are used in acid-catalyzed reactions, including Friedel-Crafts alkylations and acylations and electrophilic substitutions or additions. See ACID AND BASE; ACYLATION; DIELS-ALDER REACTION; FRIEDEL-CRAFTS REACTION; SUBSTITUTION REACTION. Product isolation from ionic liquids is generally accomplished through extraction with organic solvents. Because most ionic liquids have no measurable vapor pressure, simple vacuum techniques can be used to recover the product in the organic solvent. Biphasic systems composed of ionic liquids and water or organic phases have been used in catalysis reactions exploiting the properties of some ionic liquids to easily recover both product and catalyst. As stated previously, the ionic constituents of an ionic liquid can be “tuned” to exhibit desired physical properties to support such biphasic reactions. The unique properties of ionic liquids, such as their stability and nonvolatility, make them good candidates for use as solvents in homogeneous catalysis systems. Polymerization reactions in ionic liquids, using transition-metal catalysts and conventional
439
440
Ionization organic initiators, have been studied and have demonstrated increased efficiency compared with traditional polymerization solvents. See HOMOGENEOUS CATALYSIS. Green chemistry. Sustained interest in ionic liquids can be attributed to their desirable physical properties, such as their electron conductivity, as well as their novelty and the possibility of enhanced reactions. The perceived potential to eliminate volatile organic solvents in synthetic and separation processes has also driven interest and investigation into ionic liquids. Ionic liquids have many characteristics relevant to a general “green chemistry” approach, including the lack of volatility; however, almost all other properties (such as toxicity, stability, and reactivity) vary with the cation and anion components and cannot readily be generalized. The utility and interest in ionic liquids, as a class of fluids, rests with individual examples displaying new, improved, or different combinations of solvent properties. See GREEN CHEMISTRY. M. B. Turner; J. D. Holbrey; S. K. Spear; R. D. Rogers Bibliography. R. D. Rogers and K. R. Seddon (eds.), Ionic Liquids; Industrial Applications to Green Chemistry, ACS Symp. Ser. 818, American Chemical Society, Washington, DC, 2002; R. D. Rogers and K. R. Seddon (eds.), Ionic Liquids as Green Solvent, Progress and Prospects, ACS Symp. Ser. 856, American Chemical Society, Washington, DC, 2003; P. Wasserscheid and T. Welton (eds.), Ionic Liquids in Synthesis, Wiley-VCH, Weinheim, 2002; T. Welton, Room temperature ionic liquids. Solvents for synthesis and catalysis, Chem. Rev., 99:2071, 1999.
Ionization The process by which an electron is removed from an atom, molecule, or ion. This process is of basic importance to electrical conduction in gases, liquids, and solids. In the simplest case, ionization may be thought of as a transition between an initial state consisting of a neutral atom and a final state consisting of a positive ion and a free electron. In more complicated cases, a molecule may be converted to a heavy positive ion and a heavy negative ion which are separated. Ionization may be accomplished by various means. For example, a free electron may collide with a bound atomic electron. If sufficient energy can be exchanged, the atomic electron may be liberated and both electrons separated from the residual positive ion. The incident particle could as well be a positive ion. In this case the reaction may be considerably more complicated, but may again result in a free electron. Another case of considerable importance is the photoelectric effect. Here a photon interacts with a bound electron. If the photon has sufficient energy, the electron may be removed from the atom. The photon is annihilated in the process. Other methods of ionization include thermal processes, chemical reactions, collisions of the second kind, and col-
lisions with neutral molecules or atoms. See ELECTRICAL CONDUCTION IN GASES; ELECTRODE POTENTIAL. Glenn H. Miller Bibliography. R. Gomer, Field Emission and Field Ionization, 1961, reprint 1993; H. Russotti, The Study of Ionic Equilibria: An Introduction, 1978; D. Veza (ed.), Physics of Ionized Gases, 1995; A. Von Engel, Ionized Gases, 2d ed., 1965, reprint 1994.
Ionization chamber An instrument for detecting ionizing radiation by measuring the amount of charge liberated by the interaction of ionizing radiation with suitable gases, liquids, or solids. These radiation detectors have played an important part in the development of modern physics and have found many applications in basic scientific research, in industry, and in medicine. Principle of operation. While the gold leaf electroscope (Fig. 1) is the oldest form of ionization chamber, instruments of this type are still widely used as monitors of radiation by workers in the nuclear or radiomedical professions. In this device, two thin flexible pieces of gold leaf are suspended in a gasfilled chamber. When these are electrically charged, as in Fig. 1, the electrostatic repulsion causes the two leaves to spread apart. If ionizing radiation is incident in the gas, however, electrons are liberated from the gas atoms. These electrons then drift toward the positive charge on the gold leaf, neutralizing some of this charge. As the charge on the gold leaves decreases, the electrostatic repulsion decreases, and hence the separation between the leaves decreases. By measuring this change in separation, a measure is obtained of the amount of radiation incident on the gas volume. While this integrated measurement may be convenient for applications such as monitoring the total radiation exposure of humans, for many purposes it is useful to measure the ionization pulse produced by a single ionizing particle. See ELECTROSCOPE. The simplest form of a pulse ionization chamber consists of two conducting electrodes in a container
charging rod
+
insulator
flexible gold leaf gas filling −
−
−
+ + −−
ionizing radiation
+ + − −
+ +
+
−
+ +
− +
+ −
+ − −
gas-tight conductor
Fig. 1. Gold leaf electroscope used as a radiation detector.
Ionization chamber
cathode
ionizing radiation gas-tight window
positive ions − −+ − ++−++−+ + +− +−−+−−+− electrons
− Frisch grid
+
anode Fig. 2. Parallel-plate ionization chamber.
filled with gas (Fig. 2). A battery, or other power supply, maintains an electric field between the positive anode and the negative cathode. When ionizing radiation penetrates the gas in the chamber—entering, for example, through a thin gas-tight window—this radiation liberates electrons from the gas atoms leaving positively charged ions. The electric field present in the gas sweeps these electrons and ions out of the gas, the electrons going to the anode and the positive ions to the cathode. The basic ion chamber signal consists of the current pulse observed to flow as a result of this ionization process. Because the formation of each electronion pair requires approximately 30 eV of energy on the average, this signal is proportional to the total energy deposited in the gas by the ionizing radiation. Because the charge liberated by a single particle penetrating the chamber is small, very lownoise high-gain amplifiers are needed to measure this charge. In the early days, this was a severe problem, but such amplifiers have become readily available with the development of modern solid-state electronics. See AMPLIFIER. In a chamber, such as that represented in Fig. 2, the current begins to flow as soon as the electrons and ions begin to separate under the influence of the applied electric field. The time it takes for the full current pulse to be observed depends on the drift velocity of the electrons and ions in the gas. These drift velocities are complicated functions of gas type, voltage, and chamber geometry. However, because the ions are thousands of times more massive than the electrons, the electrons always travel several orders of magnitude faster than the ions. As a result, virtually all pulse ionization chambers make use of only the relatively fast electron signal. The electron drift velocities for a few gases are given in Fig. 3. Using one of the most common ion chamber gases—argon with a small amount of methane—with electrode spaces of a few centimeters and voltages of a few hundred volts, the electron drift time is of order a microsecond, while the positive-ion drift time is of order milliseconds. By using narrow-bandpass amplifiers sensitive only to signals with rise times of order a microsecond, only the electron signals are observed. Energy spectrum. One of the most important uses of an ionization chamber is to measure the total en-
ergy of a particle or, if the particle does not stop in the ionization chamber, the energy lost by the particle in the chamber. When such an energy-sensitive application is needed, a simple chamber geometry such as that shown in Fig. 2 is not suitable because the fast electron signal charge is a function of the relative distance that the ionization event occurred from the anode and cathode. If an ionization event occurs very near the cathode, the electrons drift across the full electric potential V0 between the chamber electrodes, and a full electron current pulse is recorded; if an ionization event occurs very near the anode, the electrons drift across a very small electric potential, and a small electron pulse is recorded. This geometrical sensitivity is a result of image charges induced by the very slowly moving positive ions. It can be shown that if the electrons drift through a potential difference V, the fast electron charge pulse is q = (V/V0)q, where q is the total ionization charge liberated in the gas. This geometrical dependence can be eliminated by introducing a Frisch grid as indicated in Fig. 4. This grid shields the anode from the positive ions and, hence, removes the effects of the image charges. By biasing the anode positively, relative to the grid, the electrons are pulled through the grid and collected on the anode. Now no signal is observed on the anode until the electrons drift through the grid, but the signal charge which is then observed is the full ionization charge q. While the ionization chamber generates only small quantities of signal charge for incident particles or photons of megaelectronvolt energies, the resulting signals are nevertheless well above the noise level of modern low-noise electronic amplifiers. When the signals generated by many incident particles of the same energy are individually measured, and a histogram is plotted representing the magnitude of a signal pulse versus the total number of pulses with that magnitude, then an energy spectrum results.
10
drift velocity, cm/µs
gas filling
CH4
8 Ar/CH4
6
4 ethylene 2 pure Ar 0
0
0.2
0.4 0.6 0.8 1.0 1.2 electric field , volts/cm torr pressure
1.4 1.6
Fig. 3. Electron drift velocity in four different gases as a function of the ratio of the applied electric field strength in volts/centimeter to gas pressure in torrs. 1 torr = 133 Pa. (After H. W. Fulbright, Ionization chambers, Nucl. Instrum. Meth., 162(1979):21–28,1979)
441
442
Ionization chamber gas filling cathode ionizing radiation gas-tight window
positive ions + −− + −++ − ++− + − ++ − +−− + − −+ − electrons
− V0 + anode
Fig. 4. Frisch grid parallel-plate ionization chamber.
normalized number of signal pulses per unit energy range 1.0 σ 0.5
0
FWHM
6 MeV
0
energy
Fig. 5. Idealized energy spectrum produced by monoenergetic (6-MeV) alpha particles incident on an ideal gridded ionization chamber. The spectrum consists of a single gaussian peak with standard deviation σ.
Such a spectrum, smoothed out, consists of an essentially gaussian distribution with standard deviation σ (Fig. 5). Assuming a negligible contribution from amplifier noise, it might at first sight appear that σ should correspond to the square root of the average number of electron-ion pairs produced per incident particle. In fact, σ is usually found to be less than this by a substantial amount, usually designated F, where F is the Fano factor. See PROBABILITY; STATISTICS.
nuclear target
θ=
nu
c
r lea
sc
a
r tte
ing
an
gle
ioniz in g par ticle gas-tight window
cathode (negative high voltage)
− + + − + − + − + + − + electrons
solid-state ionization chamber solid-state detector signal
Frisch grid (ground)
accelerator beam anode signals (delayed by electron drift time) Fig. 6. Heavy-ion detector telescope used to study nuclear reactions.
It is usual to express the width of an energy distribution such as that of Fig. 5 not in terms of σ but in terms of the “full width at half maximum,” usually designated FWHM, or . It can be shown that, for situations in which the width of the energy spectrum is governed by statistics alone, the FWHM is given by the equation σ = 2.36F E, where is the average energy required to create an electron-ion pair and E is the energy deposited in the chamber by each incident particle or photon. Values of the Fano factor F as low as 0.1 have been observed for certain gases. Gaseous ionization chambers. Because of the very few basic requirements needed to make an ionization chamber (namely, an appropriate gas with an electric field), a wide variety of different ionization chamber designs are possible in order to suit special applications. In addition to energy information, ionization chambers are now routinely built to give information about the position within the gas volume where the initial ionization event occurred. This information can be important not only in experiments in nuclear and high-energy physics where these positionsensitive detectors were first developed, but also in medical and industrial applications. This position-sensitivity capability results from the fact that, to a good approximation, electrons liberated in an ionizing event drift along the electric field line connecting the anode and cathode, and they drift with uniform velocity (Fig. 3). Hence, a measure of the drift time is a measure of the distance from the anode that the ionization occurred. A simple illustration of this use is a heavy-ion nuclear physics detector “telescope” used in the basic study of nuclear reactions (Fig. 6). Ionizing charged particles (such as 1H, 4 He, and 12C) produced in nuclear collisions enter the detector telescope through a thin gas-tight window at the left, pass through two Frisch grid ionization chambers, and then stop in a solid-state ionization detector. Measurement of the ionization produced in the gas versus the total energy of the particle as determined by the solid-state ionization detector gives sufficient information to uniquely identify the mass and atomic charge of the incident particle. Because the response of the solid-state detector is fast relative to the electron drift time, the difference in time of the signals from the solid-state detector and the anode determines the electron drift time and, hence, the distance above the grid that the particle entered the ionization chamber. This distance can be used to determine the nuclear scattering angle. Hence, a very simple device can be designed which gives several pieces of useful information. While this example illustrates the principles, very complex ionization chambers are now routinely used in heavy-ion and high-energy physics where tens or a hundred signals are recorded (using a computer) for a single ionization event. Position-sensitive heavy-ion detectors with active surfaces as large as a square meter have been developed. Aside from applications in basic nuclear physics research, ionization chambers are widely used in
Ionization chamber other applications. Foremost among these is the use of gas ionization chambers for radiation monitoring. Portable instruments of this type usually employ a detector containing approximately 1 liter of gas, and operate by integrating the current produced by the ambient radiation. They are calibrated to read out in convenient units such as milliroentgens per hour. Another application of ionization chambers is the use of air-filled chambers as domestic fire alarms. These employ a small ionization chamber containing a low-level radioactive source, such as 241Am, which generates ionization at a constant rate, the resulting current being monitored by a small solid-state electronic amplifier. On the introduction of smoke into the chamber (which is open to the ambient air), the drifting charged ions tend to attach themselves to the smoke particles. This reduces the ionization chamber current, since the moving charge carriers are now much more massive than the initial ions and therefore exhibit correspondingly reduced mobilities. The observed reduction in ion current is used to trigger the alarm. See FIRE DETECTOR. Another development in ion chamber usage was that of two-dimensional imaging in x-ray medical applications to replace the use of photographic plates. This imaging depends on the fact that if a large flat parallel-plate gas ionization chamber is illuminated with x-rays (perpendicular to its plane), the resulting charges will drift to the plates and thereby form an “image” in electrical charge of the pointby-point intensity of the incident x-rays. This image can be recorded xerographically by arranging for one plate to be a suitably charged insulator. This insulator is preferentially discharged by the collected ions. The resulting charge pattern is recorded by dusting the insulator with a fine powder and transferring this image to paper in the usual xerographic technique. Alternatively, the xerographic insulator may be a photoconductor, such as selenium, which is preferentially discharged by the ionization produced in the solid material. This is then an example of a solid ionization chamber, and its action closely parallels the operation of the optical xerographic copying machines. Such x-ray imaging detectors provide exceedingly high-quality images at a dosage to the patient substantially less than when photographic plates are used. See PHOTOCOPYING PROCESSES. Gaseous ionization chambers have also found application as total-energy monitors for high-energy accelerators. Such applications involve the use of a very large number of interleaved thin parallel metal plates immersed in a gas inside a large container. An incident pulse of radiation, due for example to the beam from a large accelerator, will produce a shower of radiation and ionization inside the detector. If the detector is large enough, essentially all of the incident energy will be dissipated inside the detector (mostly in the metal plates) and will produce a corresponding proportional quantity of charge in the gas. By arranging that the plates are alternately
biased at a positive and negative potential, the entire device operates like a large interleaved gas ion chamber. The total collected charge is then a measure of the total energy in the initial incident pulse of radiation. Solid ionization chambers. Ionization chambers can be made where the initial ionization occurs, not in gases, but in suitable liquids or solids. In fact, the discovery of extremely successful solid-state ionization detectors in the early 1960s temporarily diverted interest from further developments of gas-filled chambers. In the solid-state ionization chamber (or solid-state detector) the gas filling is replaced by a large single crystal of suitably chosen solid material. In this case the incident radiation creates electron-hole pairs in the crystal, and this constitutes the signal charge. In practice, it has been found that only very few materials can be produced with a sufficiently high degree of crystalline perfection to allow this signal charge to be swept out of the crystal and collected. Although many attempts were made in this direction in the 1940s in crystal counters, using such materials as AgCl, CdS, and diamond, these were all failures due to the crystals not having adequate carrier transport properties. In the late 1950s, however, new attempts were made in this direction using single crystals of the semiconductors silicon and germanium. These were highly successful and led to detectors that revolutionized low-energy nuclear spectroscopy. There are two important differences between solid and gas-filled ionization chambers. First, it takes much less energy to create an electron-hole pair in a solid than it does to ionize gas atoms. Hence, the intrinsic energy resolution obtainable with solid-state detectors is better than with gas counters. Gammaray detectors with resolutions better than 180 eV are commercially available. Second, in the case of solid semiconductors, the positive charge is carried by electron holes whose mobilities are similar to those of electrons. Hence, both the electrons and holes are rapidly swept away by the electric field and, as a result, no Frisch grid is needed to electrically shield the anode from the image charge effects of slowmoving positive ions as in the case of gas- or liquidfilled ionization chambers. See CRYSTAL COUNTER; GAMMA-RAY DETECTORS; SEMICONDUCTOR. Liquid ionization chambers. The use of a liquid in an ionization chamber combines many of the advantages of both solid and gas-filled ionization chambers; most importantly, such devices have the flexibility in design of gas chambers with the high density of solid chambers. The high density is especially important for highly penetrating particles such as gamma rays. Unfortunately, until the 1970s the difficulties of obtaining suitable high-purity liquids effectively stopped development of these detectors. During the 1970s, however, a number of groups built liquid argon ionization chambers and demonstrated their feasibility. A Frisch grid liquid argon chamber achieved a resolution of 34 keV (FWHM).
443
444
Ionization chamber
glass insulator
tubular metal cathode
fine central wire anode gas filling
Fig. 7. Basic form of a simple single-wire gas proportional counter.
Proportional counters. If the electric field is increased beyond a certain point in a gas ionization chamber, a situation is reached in which the free electrons are able to create additional electron-ion pairs by collisions with neutral gas atoms. For this to occur, the electric field must be sufficiently high so that between collisions an electron can pick up an energy that exceeds the ionization potential of the neutral gas atoms. Under these circumstances gas multiplication, or avalanche gain, occurs, thereby providing additional signal charge from the detector. A variety of electrode structures have been employed to provide proportional gas gain of this type. The most widely used is shown in Fig. 7. Here a fine central wire acts as the anode, and the avalanche gain takes place in the high field region immediately surrounding this wire. In practice, under suitable circumstances, it is possible to operate at gas gains of up to approximately 106. The gas gain is a function of the bias voltage applied to the proportional counter and takes the general form shown in Fig. 8. Similar avalanche multiplication effects can occur in semiconductor junction detectors, although there the situation is less favorable, and such devices have not found very widespread use except as optical detectors. See JUNCTION DETECTOR.
gas gain proportional counter region
104 103 102
ionization chamber region
1.0
bias voltage Fig. 8. Plot of gas gain versus applied voltage for a gas-filled radiation detector.
The large gas gains realizable with proportional counters have made them extremely useful for research applications involving very low-energy radiation. In addition, their flexibility in terms of geometry has made it possible to construct large-area detectors, of the order of 10 ft2 (1 m2), suitable for use as x-ray detectors in space. Essentially all that has been learned to date regarding x-ray astronomy has involved the use of such detectors aboard space vehicles. See X-RAY TELESCOPE. Further exceedingly useful applications of gas proportional counters involve their use as positionsensitive detectors. In Fig. 7, for example, if the anode wire is grounded at both ends, then the signal charge generated at a point will split and flow to ground in the ratio of the resistance of the center wire between the point of origin and the two ends of the wire. This device therefore comprises a one-dimensional position-sensitive detector. Such devices are widely used as focal plane detectors in magnetic spectrographs. Similar position-sensitive operation can be obtained by taking account of the rise time of the signals seen at each end of the wire. Further extension of such methods allows twodimensional detectors to be produced, a wide variety of which are under investigation for medical and other imaging uses. The relatively large signals obtainable from gas proportional counters simplifies the requirements of the subsequent amplifiers and signal handling systems. This has made it economically feasible to employ very large arrays, on the order of thousands, of such devices in multidimensional arrays in high-energy physics experiments. By exploiting refinements of technique, it has proved possible to locate the tracks of charged particles to within a fraction of a millimeter in distances measured in meters. Such proportional counter arrays can operate at megahertz counting rates since they do not exhibit the long dead-time effects associated with spark chambers. Geiger counters. If the bias voltage across a proportional counter is increased sufficiently, the device enters a new mode of operation in which the gas gain is no longer proportional to the initial signal charge but saturates at a very large, and constant, value. This provides a very economical method of generating signals so large that they need no subsequent ampli¨ fication. See GEIGER-MULLER COUNTER. The most widespread use of Geiger counters continues to be in radiation monitoring, where their large output signals simplify the readout problem. They have also found extensive use in cosmicray research, where again their large signals have made it feasible to use arrays of substantial numbers of detectors without excessive expenditures on signal-processing electronics. See PARTICLE DETECTOR. William A. Lanford Bibliography. D. A. Bromley (ed.), Detectors in Nuclear Science, 1979; C. F. Delaney and E. C. Finch, Radiation Detectors: Physical Principles and Applications, 1992; K. Kleinknecht, Detectors for Particle Radiation, 2d ed., 1999; G. F. Knoll, Radiation Detection and Measurement, 3d ed., 1999.
Ionophore
Ionization potential
Ionophore
The potential difference through which a bound electron must be raised to free it from the atom or molecule to which it is attached. In particular, the ionization potential is the difference in potential between the initial state, in which the electron is bound, and the final state, in which it is at rest at infinity. The concept of ionization potential is closely associated with the Bohr theory of the atom. Although the simple theory is applicable only to hydrogenlike atoms, the picture furnished by it conveys the idea quite well. In this theory, the allowed energy levels for the electron are given by the equation below,
A substance that can transfer ions from a hydrophilic medium such as water into a hydrophobic medium, such as hexane or a biological membrane, where the ions typically would not be soluble, also known as an ion carrier. The ions transferred are usually metal ions [for example, lithium (Li+), sodium (Na+), potassium (K+), magnesium (Mg2+), or calcium (Ca2+)]; but there are ionophores that promote the transfer of other ions, such as ammonium ion (NH4+) or amines of biological interest. See ION. Ionophores were discovered in the early 1960s, when it was found that certain antibiotics, whose mechanism of action was unknown at the time, depended on the presence of ions such as potassium for their biological activity. Mitochondria isolated from cells would swell when placed in an aqueous medium containing small amounts of potassium ions and the antibiotic valinomycin. Measurement of the pH and potassium-ion concentration of the medium revealed that the potassium concentration decreased while the hydrogen concentration increased correspondingly. Thus, valinomycin was catalyzing the exchange of potassium ions in the medium for hydrogen ions present in the mitochondria. The swelling of the mitochondria was found to be due to an increase in their water content, which was necessary to rehydrate the potassium ions. The effect was reasonably specific for potassium ion, as smaller ions such as sodium and lithium were without effect; and, if either valinomycin or the potassium ions were independently eliminated from the experiment, no swelling was observed. Thus valinomycin had promoted the transfer of potassium ion across a hydrophobic cell membrane. Mechanism of transfer. There are two different mechanisms by which ionophores promote the transfer of ions across hydrophobic barriers: ionionophore complex formation and ion channel formation (see illus.). In complex formation, the ion forms a coordination complex with the ionophore in which there is a well-defined ratio (typically 1:1) of ion to ionophore. In these complexes the ionophore wraps around the ion so that the ion exists in the polar interior of the complex while the exterior is predominantly hydrophobic in character, and as such is
En = −
k n2
n = 1, 2, 3, . . .
where En is the energy of the state described by n. The constant k is about 13.6 eV for atomic hydrogen. The energy approaches zero as n becomes infinite. Thus zero energy is associated with the free electron. On the other hand, the most tightly bound case is given by setting n equal to unity. By the definition given above, the ionization potential for the most tightly bound, or ground, state is then 13.6 eV. The ionization potential for any excited state is obtained by evaluating En for the particular value of n associated with that state. For a further discussion of the energy levels of an atom see ATOMIC STRUCTURE AND SPECTRA; ELECTRONVOLT. The ionization potential for the removal of an electron from a neutral atom other than hydrogen is more correctly designated as the first ionization potential. The potential associated with the removal of a second electron from a single ionized atom or molecule is then the second ionization potential, and so on. Ionization potentials may be measured in a number of ways. The most accurate measurement is obtained from spectroscopic methods. The transitions between energy states are accompanied by the emission or absorption of radiation. The wavelength of this radiation is a measure of the energy difference. The particular transitions that have a common final energy state are called a series. The series limit represents the transition from the free electron state to the particular state common to the series. The energy associated with the series limit transition is the ionization energy. Another method of measuring ionization potentials is by electron impact. Here the minimum energy needed for a free electron to ionize in a collision is determined. The accuracy of this type of measurement cannot approach that of the spectroscopic method. See ELECTRON CONFIGURATION; ELECTRONEGATIVGlenn H. Miller ITY; STRUCTURAL CHEMISTRY. Bibliography. H. C. Ohanian, Modern Physics, 2d ed., 1995; J. W. Rohlf, Modern Physics from A to Z, 1994; T. R. Sandin, Essentials of Modern Physics, 1989.
ion 䊝
䊝 (a)
ionophore
ionophore
cell membrane
cell membrane
inside of cell
inside of cell
䊝
(b)
Two mechanisms by which ionophores promote ion transfer across cell membranes: (a) ion-ionophore complex and (b) ion-channel-formation ion transfer.
445
446
Ionophore soluble in nonpolar media. In these complexes the ion is coordinated by oxygen atoms present in the ionophore molecule through ion-dipole interactions. The ionophore molecule essentially acts as the solvent for the ion, replacing the aqueous solvation shell that normally surrounds the ion. An example of an ionophore that forms ion-ionophore complexes is nonactin (1). See COORDINATION COMPLEXES. Ionophores that act via ion channel formation are found in biological environments. In this type of ionophore the molecule forms a polar channel in an otherwise nonpolar cell membrane, allowing passage of small ions either into or out of the cell. An example of an ion-channel-forming ionophore is amphotericin-B (2). CH3
H3C O
H
O
H3C
H
O
O
H
O H
O
CH3
CH3
H O
H
O O
O
H CH3
O
H
O CH3
CH3
(1) OH H3C
OH
O OH
O
HO
OH
OH
OH
OH
O
CH3
CO2H OH
H3C
NH2
O O
OH CH3
(2) HO
H3C
H3C O
CH3O
O CH3
H
O
CH3
H3C
H
O
H
H
O
OH OH
CH3
CH3 HO2C
CH3
(3) Types. Ionophores were discovered as metabolites of microorganisms, but as interest in these compounds has increased, compounds possessing ionophoric activity have been designed and synthesized by chemists. Naturally occurring compounds. There are several different types of naturally occurring ionophores. These have been classified into four classes, each of which has antibiotic activity. 1. Peptide ionophores are exemplified by the gramicidins, a family of polypeptide-containing an-
tibiotics. These compounds are thought to form ion channels via the dimerization of two gramicidin molecules to form a helical conformation in a cell membrane. Ions may then pass down the polar interior of the helix. 2. Valinomycin is an example of the cyclic depsipeptide class of ionophores. The molecule is a 36-membered macrocycle that is made up of a repetition of valine, hydroxyvaleric acid, and lactic acid residues. The structure of valinomycin complexed to potassium ion has been determined by x-ray crystallography. In this complex the conformation of the molecule is such that the potassium ion resides in the center of the complex coordinated to six carbonyl oxygens that are arranged in an octahedral configuration about the metal ion. 3. Nonactin (1) is a representative of the macrotetrolide class of ionophores. In this macrocycle the tetrahydrofuranyl oxygen atoms and two of the four lactone carbonyl oxygen atoms are responsible for chelation of metal ions. See CHELATION. 4. The polyether ionophores form a large family of structurally complex compounds. Monensin (3) is the best-known member of this class and was the first polyether ionophore in which the structure was determined by x-ray crystallography. Members of this class of ionophores have in common the presence of a carboxylic acid functional group (COOH) on one end of the molecule and a hydroxyl group (OH) on the other end. In solution these groups link to each other with hydrogen bonds, folding the molecule into a cyclic array. The metal ion sits in the center of the complex and is coordinated to the ether oxygen atoms of the tetrahydrofuran and tetrahydropyran rings. Synthetic compounds. In addition to the naturally occurring ionophores, a large number of compounds possessing ionophoric activity have been synthesized. The best-known synthetic ionophores are the crown ethers, which are cyclic polyethers containing the repeating units of ( O CH2 CH2 ). The name crown ethers is due to the resemblance of the molecule’s conformation to a king’s crown in which the oxygen atoms form the points of the crown. Simple crown ethers are named by the size of the ring and the number of oxygen atoms contained. For example, the name 18-crown-6 refers to an 18-membered ring containing 6 ether oxygen atoms. As with other ionophores, crown ether complex ions are coordinated through a series of ion-dipole interactions occurring between the ion and the ether oxygen atoms present in the ionophore. See MACROCYCLIC COMPOUND. Synthetic ionophores such as the crown ethers have proven to be useful reagents in synthetic organic chemistry. These reagents can solubilize inorganic salts, for example, potassium hydroxide (KOH), in nonpolar solvents such as toluene, in which they normally would be insoluble. This capability allows reactions involving these salts to be successfully conducted in the absence of water. An example of the utility of these reagents is the very efficient hydrolysis of the ester group in tert-butyl
Ionosphere mesitoate [reaction (1)], which is readily accomCH3
CH3
O O
H3C
toluene
CH3 + KOH CH3
18-crown-6
CH3 CH3
O O− K + + CH3
H3C
H3C H3C
CH3 OH
(1)
imbalance by allowing ions to penetrate the cell membrane as ion-ionophore complexes or via the formation of ion channels. Gram-positive bacteria appear to be particularly sensitive to the effect of ionophores perturbing normal ion transport. See ION TRANSPORT. The pharmacological use of many of the ionophore antibiotics in humans is very restricted because of their high toxicity and multitude of physiological effects. See ANTIBIOTIC. Chris A. Veale Bibliography. M. Dobler, Ionophores and Their Structures, 1981.
94% yield
plished by potassium hydroxide in toluene in the presence of a potassium-complexing crown ether. No reaction occurs under identical conditions in the absence of the crown ether. Many inorganic reagents (generically denoted as M+X−) are found to be much more reactive when their reactions are conducted in nonpolar solvents containing crown ethers. The increased reactivity is due to the poor solvation of the anion (X−). Potassium fluoride (KF) provides an example of this effect. Normally, in aqueous solution the fluoride ion (F-) is heavily solvated by water molecules and is not very reactive. However, when potassium fluoride is dissolved in nonpolar solvents by employing a crown ether as the cation-solvating agent [reaction (2)], the O O
O
O
O
nonpolar solvent
+ KF O 18-Crown-6 O O
O F−
K+ O
(2)
O O
unsolvated fluoride ion becomes a very powerful nucleophile. See ELECTROPHILIC AND NUCLEOPHILIC REAGENTS. Biological activity antibiotics. The biological activity of ionophore antibiotics is due to their ability to disrupt the flow of ions either into or out of cells. Under normal conditions cells have a high internal concentration of potassium ions but a low concentration of sodium ions. The concentration of ions in the extracellular medium is just the reverse, high in sodium ions but low in potassium ions. This imbalance, which is necessary for normal cell function, is maintained by the presence of a specific transport protein (sodium-potassium ATPase) present in the cell membrane that pumps sodium ions out of the cell in exchange for potassium ions. Ionophore antibiotics possess the ability to disrupt this ionic
Ionosphere The part of the upper atmosphere that is sufficiently ionized that the concentration of free electrons affects the propagation of radio waves. Existence of the ionosphere was suggested simultaneously in 1902 by O. Heaviside in England and A. E. Kennelly in the United States to explain the transatlantic radio communication that was demonstrated the previous year by G. Marconi; and for many years it was commonly referred to as the Kennelly-Heaviside layer. The existence of the ionosphere as an electronically conducting region had been postulated earlier by B. Steward to explain the daily variations in the geomagnetic field. See IONIZATION; RADIO-WAVE PROPAGATION. The ionosphere has been extensively explored by rockets, satellites, and remote-sensing tools. The earliest technique involved the ionosonde, which utilizes a pulsed transmitter to send radio signals vertically upward while slowly sweeping the radio frequency. The ionosphere is a type of plasma, and such a system has a free oscillation mode. When the frequency of the signal transmitted by the ionosonde reaches a height where it equals the free oscillation frequency of the plasma, it reflects to Earth. Pulses reflected by the ionosphere are received at the transmitter and recorded; the elapsed time between pulse transmission and reception can be measured, and by using the speed of light it can be converted to an apparent distance to the point of reflection. The plasma density content can be found in this way by using an ionosonde, but no information is obtained above the point where the plasma density is maximum, so properties of the ionosphere above about 180 mi (300 km) altitude were impossible to determine. Major breakthroughs occurred in space science with the development of the artificial space satellite and sounding rockets. See METEOROLOGICAL ROCKET; OSCILLATION; PLASMA (PHYSICS). Another tool for ionospheric research is the incoherent scatter radar. Powerful radio sources, such a radar, can probe out to many thousand kilometers’ altitude and detect the weak signal scattered back to Earth by thermal fluctuations in the plasma. The first two of these instruments were built in 1960 near Arecibo, Puerto Rico, and Jicamarca, Peru; both are roughly 1000 ft (300 m) across. The Arecibo system is so sensitive that it is used for radio astronomy as well
447
Ionosphere as ionospheric and atmospheric science. Eight radars of this type form an observatory network stretching north-south in the Americas from Peru to Greenland and east-west from Japan to Europe. Using the Doppler technique, much like the radar in a police car, such a system can be used to measure the velocity of the plasma as well as its temperature and density. See DOPPLER RADAR; RADAR. Regions. The ionosphere is highly structured in the vertical direction. It was first thought that discrete layers were involved, referred to as the D, E, F1, and F2 layers; however, the layers actually merge with one another to such an extent that they are now referred to as regions rather than layers. The very high temperatures in the Earth’s upper atmosphere are colocated with the upper ionosphere (see illus.), since both are related to the effect of x-rays from the Sun. That is, the x-rays both ionize and heat the very uppermost portion of the Earth’s atmosphere. The ionosphere also displays striking geographic and temporal variations; the latter include regular diurnal, seasonal, and sunspot-cycle components and irregular day-to-day components that are associated mainly with variations in solar activity and atmospheric motion. See SUN. D region. The D region is the lowest ionospheric region, extending approximately from 35 to 50 mi (60 to 85 km). The upper portion is caused mainly by the ionization of nitric oxide by Lyman-alpha radiation in sunlight, and the lower portion is mainly due to ionization by cosmic radiation. The daytime electron concentrations are about 108–109 per cubic
meter. The region virtually disappears at night even though the cosmic radiation continues, because attachment of electrons to molecules (that is, formation of negative ions) quickly removes free electrons. This effect is suppressed during the daytime by photo detachment. The D region is the only ionospheric region in which negative ions are thought to be significant. See COSMIC RAYS. The collision frequency for electrons with heavier particles in the D region is relatively high, which causes absorption of energy from radio signals traveling through the region. This severely limits radio propagation and is responsible for the very limited daytime range for stations in the AM broadcast band; FM radio and television signals are at such a high frequency that they pass directly through the ionosphere with no reflection possible. At night, when the D region disappears, lower-frequency radio signals can reflect many times from the ionosphere and the Earth, leading to the vast propagation distances used by amateur radio operators for communication. See RADIO-WAVE PROPAGATION. The D region is located within the Earth’s atmospheric region known as the mesosphere (see illus.). The coldest temperatures on Earth and the highest clouds occur here. The latter occur in the summer polar regions where temperatures below −280◦F (100 K) have been recorded. These clouds are called noctilucent since they can be seen only at twilight, illuminated by sunlight at their height near 48 mi (80 km) and viewed from the dark surface of the Earth. The clouds are composed of ice; they are charged electrically and form a different type of
temperature,°F −460
260
980
1700
2400
protonsphere thermosphere 1000
day
600
600
1000
night
60
D region
6
10
100
60
mesosphere
stratosphere
altitude, mi
E region
altitude, km
100
altitude, mi
F region altitude, km
448
6
10
troposphere 0.6
1 103
104
105
0
plasma density, particles cm−3 (a)
0.6
1
106
400
800 1200 temperature, K
1600
(b)
Atmospheric profiles. (a) Ionosphere, nighttime plasma density. (b) Thermal structure of the atmosphere. (After M. C. Kelley, The Earth’s Ionosphere: Plasma Physics and Electrodynamics, Academic Press, 1989)
Ionosphere material known as dusty or icy plasma. See MESOSPHERE. E region. Soft x-rays and the more penetrating portions of the extreme ultraviolet radiation from the Sun are absorbed in the altitude region from 50 to 85 mi 85 to (140 km), where they cause daytime electron concentrations of the order of 1011 per cubic meter in the E region (see illus.). This is the region from which ionospheric reflections were first identified. The principal ions have been observed to be the oxygen ions O2+ and O+ and the oxynitrogen ion NO+, the last presumably being formed in chemical reactions involving the primary oxygen ions. The soft x-rays that are principally responsible for the formation of the E region must also produce nitrogen ions (N2+); however, these are not observed because they are removed very rapidly by chemical reactions. When the Sun sets, these molecular ions react very rapidly with equal numbers of electrons, yielding oxygen and nitrogen atoms. These reactions are so fast that the E region virtually disappears at night unless there is an aurora present. This explains the low level of nighttime plasma density in the ionospheric profile. (illus. a). The Earth is continually bombarded by meteors which burn up and become ionized when they hit the top of the atmosphere. This leaves a layer of dust, metallic atoms, and metallic ions in the E region. Ions such as iron (Fe+), sodium (Na+), and silicon (Si+) do not recombine quickly because the atom cannot break apart. The result is the sporadic occurrence of long-lived clouds of metallic ion layers in the E region. The fading in and out of distant AM radio signals is due to their reflection to Earth from these sporadic E layers. See METEOR. The E region supports the largest electric currents in the near space regions of the Earth. Currents as high as 106 A flow in the polar E region during an intense aurora. In fact, the light associated with the most beautiful of the auroral displays is generated in this region of the ionosphere and atmosphere. The electrical power that is transferred to the atmosphere in such an event is about 100 GW. F region. The solar extreme-ultraviolet radiations produce the F region, above 85 mi (140 km). This radiation is most strongly absorbed near 95 mi (160 km). Above the region of maximum photoionization, the rate for the loss of electrons and ions decreases with altitude more rapidly than does the source of ionization, so that electron concentrations increase with altitude. This decrease in loss rate occurs because O+ ions recombine directly with electrons only very slowly in a two-body radiativerecombination process. Another loss process predominates instead, an ion-atom interchange reaction [reaction (1)], followed by a dissociative recombination [reaction (2)]. The loss rate is controlled by the O+ + N2 → NO+ + N
(1)
NO+ + e− → N + O
(2)
ion-atom interchange reaction. Since the N2 concentration decreases with altitude, the concentration of
O+ increases with altitude above the region where it is actually produced. The increase in O+ and electron concentration with altitude finally stops, because the atmospheric density becomes so low that there is a rapid downward flow of ions and electrons in the gravitational field. The region of maximum ionization concentration normally occurs near 180 mi (300 km), and it is known as the F peak. Most of the ionization that occurs above the peak is lost by flow into the dense atmosphere below, where the ionization can be lost by the ion-atom interchange reaction followed by dissociative recombination [reactions (1) and (2)]. The peak daytime plasma concentration in the F region is in the vicinity of 1012 ions per cubic meter. Above the peak, the distribution of ionization is in diffusive equilibrium in the gravitational field. The colocated atmospheric region (the thermosphere) has this property as well. Unlike the atmosphere below 60 mi (100 km), which is thoroughly mixed, the various atoms and ions can separate by their mass, the lighter particles extending to very high altitudes. Hydrogen and helium can, in fact, escape the Earth’s pull. The Earth’s neutral thermosphere reaches temperatures of well over 1500◦F (1188 K) and exchanges the associated energy with the plasma in the ionosphere. This high temperature is due to the absorption of the Sun’s deadly x-rays and extremeultraviolet light, which also creates the ionized gas. Just as the ozone layer absorbs ultraviolet light in the upper stratosphere and is colocated with a temperature increase, the thermospheric temperature rise marks an important filter of these high-energy solar photons. The thermosphere is so tenuous that the temperature at a given point responds quickly to solar heat input. The result is a huge temperature difference between night and day, as much as 900◦F (500 K). In response, intense winds exceeding 450 mi/h (200 m/s) blow back and forth from high to low temperature zones. As the Earth rotates under this pattern, an observer sees a strong component at a 24-h period. This is called a solar thermal tide in analogy to the 12-h periodicity exhibited by the Earth’s oceans responding to the gravitational attraction of the Moon. See MOON. The ionosphere responds to these neutral tidal motions, but with a behavior that is constrained by the Earth’s magnetic field and the electric currents that are generated in the conducting fluid. These wind-driven currents also create electric fields. The combined action of electric, gravitational, and magnetic fields, along with the neutral wind, controls the ionospheric material. See GEOMAGNETISM; THERMOSPHERE. Heliosphere and protonosphere. Helium and atomic hydrogen are important constituents of the upper atmosphere, and their ions are also important at levels above 300 mi (500 km) or so. These gases become important, and finally predominant, constituents of the upper atmosphere because of their low mass. In diffusive equilibrium, each neutral gas is distributed
449
450
Ipecac in the gravitational field just as if the other gases were not present, and the lighter ones therefore finally come to predominate over the heavier ones above some sufficiently high altitude. The terms heliosphere and protonosphere are sometimes used to designate the regions in which helium and hydrogen ions respectively are predominant. See HELIUM; HYDROGEN. At midlatitudes these light ions dominate above about 600 mi (1000 km). Also above this height the magnetic field geometry becomes crucial, and a toruslike region called the plasmasphere forms; it is full of dense cool plasma. The torus extends to about four earth radii (11,4000 mi or 19,000 km above the surface) and, following the curved magnetic field lines, this boundary touches the top of the atmosphere at all latitudes below about 60◦ magnetic latitude (southern Canada in the American Sector). This torus can be considered an extension of the ionosphere, since it is filled from below by the ionosphere during the day and empties back into the ionosphere during the night. The plasmasphere is relatively stable, since it rotates with the planet just as the Earth’s gaseous neutral atmosphere does. At the higher latitudes the ionosphere is put into motion by the solar wind and no longer rotates with the planet. See SOLAR WIND. Electrodynamics. In an electrical generator a conducting material is moved across a magnetic field, and the resulting magnetic force drives a current. If there is no external connection, though, the electric charge piles up at the ends of the conductor and an electric field is generated. The electric field attains a magnitude that balances the magnetic forces and cancels the current. If a light bulb is attached to the generator, a small current will flow. In this case an electric field will still exist, but it will be smaller than in the case with no external connection. The same effect occurs when the Earth’s winds blow across its magnetic field. A current is generated by the magnetic force. But when boundaries are encountered, an electric field will be created. These electric fields greatly affect the dynamics of the ionosphere. This wind-driven source of electric fields dominates the electrodynamics of the ionosphere below about 60◦ latitude. At higher latitudes the Earth’s atmosphere loses control of the plasma, and the plasma is subject to strong electrical forces induced by the flow of the solar wind past the Earth. Up to 200,000 V can be impressed across the Earth’s ionosphere when the Sun’s magnetic field is aligned opposite to the Earth’s. The high-latitude ionosphere is then put into a highspeed circulation characterized by two vortices in each hemisphere. The resulting electrical force can, through collisional forces, create a similar circulation in the upper-atmosphere neutral air. See SOLAR MAGNETIC FIELD. Tremendous variations occur in the ionosphere at high latitudes because of the dynamical effects of these electrical forces and because of the additional sources of plasma production. The most notable is the visual aurora, one of the most spectacular nat-
ural sights. Views of the Earth from both scientific and defense satellites have revealed the extent and beauty of the aurora against the background of cities. See MAGNETOSPHERE; MILITARY SATELLITES. The aurora has a poleward and equatorward limit during times of magnetic storms. A resident of the arctic regions of the Northern Hemisphere see the “northern” lights in their southern sky, for example. The aurora forms two rings around the poles of the Earth. The size of the rings waxes and wanes while wavelike disturbances propagate along its extent. See AURORA; UPPER-ATMOSPHERE DYNAMICS. Space plasma physics. The ionosphere is subject to the same disruptions that plague attempts to control nuclear fusion by magnetic confinement. By using the principles of plasma physics, it is possible to explain many of the processes that were known since the early twentieth century to seriously affect radio communication systems. The culprits are generically referred to as plasma instabilities, and they occur when stored energy is rapidly released. Typically such a process is associated with localized turbulent zones and an irregular ionosphere, similar to severe weather in the lower atmosphere. At low latitudes the most dominant process is the Rayleigh-Taylor instability, in which gravitational energy is released after sunset. Huge convective “thunderstorms” arise that race around the Earth, following the sunset terminator. Intense currents in the lower ionosphere also create large-amplitude plasma sound waves via the Farley-Buneman instability all over the Earth, but with two intense regions: one near the magnetic equator and one at high latitudes where the magnetic field is nearly vertical. Some of these instabilities are induced by neutral atmospheric waves that are generated in the dense lower atmosphere and propagate upward. These gravity or buoyancy waves have wavelengths of hundreds of kilometers and are created, among other things, by weather fronts, mountain ranges, earthquakes, nuclear explosions, and aurora. Even without their role as a source of plasma instabilities, these waves push the ionosphere around, creating both wavelike horizontal undulations and intense layers in the metallic materials from meteors, the sporadic E layers. See ATMOSPHERE. Michael C. Kelley; Francis S. Johnson Bibliography. K. Davies, Ionospheric Radio, 1990; A. Giraud and M. Petit, Ionospheric Techniques and Phenomena, 1978; A. Giraud and M. Petit, Physics of the Earth’s Ionosphere, 1978; J. K. Hargreaves, The Solar-Terrestrial Environment, 1992; M. C. Kelley, The Earth’s Ionosphere: Electrodynamics and Plasma Physics, 1989; J. A. Ratcliffe, An Introduction to the Ionosphere and Magnetosphere, 1972.
Ipecac A low, perennial shrub or half-shrub of the tropical forest in Brazil and Colombia. Several species are used, but the dried roots and rhizomes of Cephaelis
Iridium
Ipecac (Cephaelis ipecacuanha). (a) Entire plant. (b) Inflorescence. (c) Flower.
ipecacuanha (Rubiaceae) constitute the material recognized as the official drug ipecac (see illus.). Medically the principal use is as an emetic and an expectorant. See RUBIALES. P. D. Strausbaugh; Earl L. Core
Iridium A chemical element, Ir, atomic number 77, relative atomic weight 192.22. Iridium is a transition metal and shares similarities with rhodium as well as the other platinum metals, including palladium, platinum, ruthenium, and osmium. The atom in the gas phase has the electronic configuration 1s2, 2s2, 2p6, 3s2, 3p6, 3d10, 4s2, 4p6 4d10, 4f14, 5s2, 5p6, 5d7, 6s2. The ionic radius for Ir3+ is 0.068 nanometer and its metallic radius is 0.1357 nm. Metallic iridium is slightly less dense than osmium, which is the densest of all the elements. See PERIODIC TABLE. 1 1 H 3 Li 11 Na 19 K 37 Rb 55 Cs 87 Fr
2 4 Be 12 Mg 20 Ca 38 Sr 56 Ba 88 Ra
3 21 Sc 39 Y 71 Lu 103 Lr
4 22 Ti 40 Zr 72 Hf 104 Rf
lanthanide series actinide series
5 23 V 41 Nb 73 Ta 105 Db
6 24 Cr 42 Mo 74 W 106 Sg
7 25 Mn 43 Tc 75 Re 107 Bh
8 26 Fe 44 Ru 76 Os 108 Hs
9 27 Co 45 Rh 77 Ir 109 Mt
10 28 Ni 46 Pd 78 Pt 110 Ds
11 29 Cu 47 Ag 79 Au 111 Rg
12 30 Zn 48 Cd 80 Hg 112
13 5 B 13 Al 31 Ga 49 In 81 Tl 113
14 6 C 14 Si 32 Ge 50 Sn 82 Pb
15 16 7 8 N O 15 16 P S 33 34 As Se 51 52 Sb Te 83 84 Bi Po
57 58 59 60 61 62 63 64 65 La Ce Pr Nd Pm Sm Eu Gd Tb
66 67 Dy Ho
89 Ac
98 Cf
90 Th
91 Pa
92 93 94 95 96 97 U Np Pu Am Cm Bk
18 2 17 He 9 10 F Ne 17 18 Cl Ar 35 36 Br Kr 53 54 I Xe 85 86 At Rn
68 69 70 Er Tm Yb
99 100 101 102 Es Fm Md No
The abundance of iridium in the Earth’s crust is very low, 0.001 ppm. For mining purposes, it is generally found alloyed with osmium in materials known as osmiridium and iridiosmium, with iridium contents ranging from 25 to 75%.
Solid iridium is a silvery metal with considerable resistance to chemical attack. Upon atmospheric exposure the surface of the metal is covered with a relatively thick layer of iridium dioxide (IrO2). Important physical properties of metallic iridium are given in the table. Because of its scarcity and high cost, applications of iridium are severely limited. Although iridium metal and many of its complex compounds are good catalysts, no large-scale commercial application for these has been developed. In general, other platinum metals have superior catalytic properties. The high degree of thermal stability of elemental iridium and the stability it imparts to its alloys does give rise to those applications where it has found success. Particularly relevant are its high melting point (2443◦C or 4429◦F), its oxidation resistance, and the fact that it is the only metal with good mechanical properties that survives atmospheric exposure above 1600◦C (2910◦F). Iridium is alloyed with platinum to increase tensile strength, hardness, and corrosion resistance. However, the workability of these alloys is decreased. These alloys find use as electrodes for anodic oxidation, for containing and manipulating corrosive chemicals, for electrical contacts that are exposed to corrosive chemicals, and as primary standards for weight and length. Platinum-iridium alloys are used for electrodes in spark plugs that are unusually resistant to fouling by antiknock lead additives. Iridium-rhodium thermocouples are used for hightemperature applications, where they have unique stability. Very pure iridium crucibles are used for growing single crystals of gadolinium gallium garnet for computer memory devices and of yttrium aluminum garnet for solid-state lasers. The radioactive
Physical properties of iridium metal Property
Value
Crystal structure ◦
Lattice constant a at 25 C, nm Thermal neutron capture cross section, barns ◦ Density at 25 C, g/cm3 Melting point Boiling point ◦ Specific heat at 0 C, cal/g ◦ Thermal conductivity 0–100 C, cal ◦ cm/cm2 s C Linear coefficient of thermal expansion ◦ ◦ 20–100 C, µin./in./ C ◦ Electrical resistivity at 0 C, microhm-cm Temperature coefficient of electrical ◦ ◦ resistance 0–100 C/ C Tensile strength (1000 lb/in.2) Soft Hard ◦ Young’s modulus at 20 C lb/in.2, static lb/in.2, dynamic Hardness, diamond pyramid number Soft Hard Hfusion, kJ/mol Hvaporization, kJ/mol Hf monoatomic gas, kJ/mol Electronegativity
Face-centered cubic 0.38394 440 22.560 ◦ ◦ 2443 C (4429 F) ◦ ◦ 4500 C (8130 F) 0.0307 0.35
6.8 4.71 0.00427
160–180 300–360 75.0 × 106 76.5 × 106 200–240 600–700 26.4 612 669 2.2
451
452
Iron isotope, 192Ir, which is obtained synthetically from 191 Ir by irradiation of natural sources, has been used as a portable gamma source for radiographic studies in industry and medicine. See HIGH-TEMPERATURE MATERIALS; PLATINUM. Bibliography. J. D. Atwood (ed.), Comprehensive Organometallic Chemistry II: Cobalt, Rhodium and Iridium, 1995; F. A. Cotton et al., Advanced Inorganic Chemistry, 6th ed., Wiley-Interscience, 1999; A. Earnshaw and N. Greenwood, Chemistry of the Elements, 2d ed., Butterworth-Heinemann, 1997.
Iron A chemical element, Fe, atomic number 26, and atomic weight 55.847. Iron is the fourth most abundant element in the crust of the Earth (5%). It is a malleable, tough, silver-gray, magnetic metal. It melts at 1540◦C, boils at 2800◦C, and has a density of 7.86 g/cm3. The four stable, naturally occurring isotopes have masses of 54, 56, 57, and 58. The two main ores are hematite, Fe2O3, and limonite, Fe2O3 · 3H2O. Pyrites, FeS2, and chromite, Fe(CrO2)2, are mined as ores for sulfur and chromium, respectively. Iron is found in many other minerals, and it occurs in groundwaters and in the red hemoglobin of blood. See PERIODIC TABLE. 1 1 H 3 Li 11 Na 19 K 37 Rb 55 Cs 87 Fr
2 4 Be 12 Mg 20 Ca 38 Sr 56 Ba 88 Ra
3 21 Sc 39 Y 71 Lu 103 Lr
4 22 Ti 40 Zr 72 Hf 104 Rf
lanthanide series actinide series
5 23 V 41 Nb 73 Ta 105 Db
6 24 Cr 42 Mo 74 W 106 Sg
7 25 Mn 43 Tc 75 Re 107 Bh
8 26 Fe 44 Ru 76 Os 108 Hs
9 27 Co 45 Rh 77 Ir 109 Mt
10 28 Ni 46 Pd 78 Pt 110 Ds
11 29 Cu 47 Ag 79 Au 111 Rg
12 30 Zn 48 Cd 80 Hg 112
13 5 B 13 Al 31 Ga 49 In 81 Tl 113
14 6 C 14 Si 32 Ge 50 Sn 82 Pb
15 16 7 8 N O 15 16 P S 33 34 As Se 51 52 Sb Te 83 84 Bi Po
57 58 59 60 61 62 63 64 65 La Ce Pr Nd Pm Sm Eu Gd Tb
66 67 Dy Ho
89 Ac
98 Cf
90 Th
91 Pa
92 93 94 95 96 97 U Np Pu Am Cm Bk
18 2 17 He 9 10 F Ne 17 18 Cl Ar 35 36 Br Kr 53 54 I Xe 85 86 At Rn
68 69 70 Er Tm Yb
99 100 101 102 Es Fm Md No
The greatest use of iron is for structural steels; cast iron and wrought iron are made in quantity, also. Magnets, dyes (inks, blueprint paper, rouge pigments), and abrasives (rouge) are among the other uses of iron and iron compounds. See CAST IRON; IRON ALLOYS; IRON METALLURGY; STAINLESS STEEL; STEEL MANUFACTURE; WROUGHT IRON. There are several allotropic forms of iron. Ferrite or α-iron is stable up to 760◦C (1400◦F). The change of β-iron involves primarily a loss of magnetic permeability because the lattice structure (body-centered cubic) is unchanged. The allotrope called γ -iron has the cubic close-packed arrangements of atoms and is stable from 910 to 1400◦C (1670 to 2600◦F). Little is known about δ-iron except that it is stable above 1400◦C (2600◦F) and has a lattice similar to that of α-iron.
The metal is a good reducing agent and, depending on conditions, can be oxidized to the 2+, 3+, or 6+ state. In most iron compounds, the ferrous ion, iron(II), or ferric ion, iron(III), is present as a distinct unit. Ferrous compounds are usually light yellow to dark green-brown in color; the hydrated ion, Fe(H2O)62+, which is found in many compounds and in solution, is light green. This ion has little tendency to form coordination complexes except with strong reagents such as cyanide ion, polyamines, and porphyrins. The ferric ion, because of its high charge (3+) and its small size, has a strong tendency to hold anions. The hydrated ion, Fe(H2O)63+, which is found in solution, combines with OH−, F−, Cl−, CN−, SCN−, N3−, C2O42−, and other anions to form coordination complexes. See COORDINATION CHEMISTRY. An interesting aspect of iron chemistry is the array of compounds with bonds to carbon. Cementite, Fe3C, is a component of steel. The cyanide complexes of both ferrous and ferric iron are very stable and are not strongly magnetic in contradistinction to most iron coordination complexes. The cyanide complexes form colored salts. See TRANSITION ELEMENTS. John O. Edwards Bibliography. F. A. Cotton et al., Advanced Inorganic Chemistry, 6th ed., Wiley-Interscience, 1999; N. N. Greenwood and A. Earnshaw, Chemistry of the Elements, 2d ed., 1997; J. Silver (ed.), Chemistry of Iron, 1993.
Iron alloys Solid solutions of metals, one metal being iron. A great number of commercial alloys have iron as an intentional constituent. Iron is the major constituent of wrought and cast iron and wrought and cast steel. Alloyed with usually large amounts of silicon, manganese, chromium, vanadium, molybdenum, niobium (columbium), selenium, titanium, phosphorus, or other elements, singly or sometimes in combination, iron forms the large group of materials known as ferroalloys that are important as addition agents in steelmaking. Iron is also a major constituent of many special-purpose alloys developed to have exceptional characteristics with respect to magnetic properties, electrical resistance, heat resistance, corrosion resistance, and thermal expansion. Table 1 lists some of these alloys. See ALLOY; FERROALLOY; STEEL. Because of the enormous number of commercially available materials, this article is limited to the better-known types of alloys. Emphasis is on special-purpose alloys; practically all of these contain relatively large amounts of an alloying element or elements referred to in the classification. Alloys containing less than 50% iron are excluded, with a few exceptions. Iron-aluminum alloys. Although pure iron has ideal magnetic properties in many ways, its low electrical resistivity makes it unsuitable for use in alternatingcurrent (ac) magnetic circuits. Addition of aluminum in fairly large amounts increases the electrical
Iron alloys
453
TABLE 1. Some typical composition percent ranges of iron alloys classified by important uses∗ Type Heat-resistant alloy castings Heat-resistant cast irons Corrosion-resistant alloy castings Corrosion-resistant cast irons Magnetically soft materials
Permanent-magnet materials
Low-expansion alloys
Fe Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. Bal. 61–53
C 0.30–0.50 0.20–0.75 1.8–3.0 1.8–3.0 0.15–0.50 0.03–0.20 1.2–4.0 1.8–3.0 — — — — — — — — — — — — — — 0.5–2.0
Mn — — 0.3–1.5 0.4–1.5 1 max 1.5 max 0.3–1.5 0.4–1.5 — — — — — — — — — — — — 0.15 0.24 0.5–2.0
Si 1–2 2–2.5 0.5–2.5 1.0–2.75 1 1.5–2.0 0.5–3.0 1.0–2.75 0.5–4.5 — — — — — — — — — — — 0.33 0.03 0.5–2.0
Cr
Ni
Co
W
8–30 10–30 15–35 1.75–5.5 11.5–30 18–27 12–35 1.75–5.5
0–7 8–41 5 max 14–30 0–4 8–31 5 max 14–32
— —
— 0.5 max
— —
— —
4 max 1 max
— — — — — — — — — — — — — 4–5
— — — — — 20 17 25 28 14 15 36 42 33–35
— — — 12 12 5 12.5 – 5 24 24
— — — — — — — — — — —
3.5 — — 17 20 — — — — — —
—
Mo
1
Al
—
Cu
Ti
7
3 max 7 max 16 16 12
12 10 12 12 8 8
6
3 3
1.25
1–3
∗This table does not include any AISI standard carbon steels, alloy steels, or stainless and heat-resistant steels or plain or alloy cast iron for ordinary engineering uses; it includes only alloys containing at least 50% iron, with a few exceptions. Bal. = balance percent of composition.
resistivity of iron, making the resulting alloys useful in such circuits. Three commercial iron-aluminum alloys having moderately high permeability at low field strength and high electrical resistance nominally contain 12% aluminum, 16% aluminum, and 16% aluminum with 3.5% molybdenum, respectively. These three alloys are classified as magnetically soft materials; that is, they become magnetized in a magnetic field but are easily demagnetized when the field is removed. The addition of more than 8% aluminum to iron results in alloys that are too brittle for many uses because of difficulties in fabrication. However, addition of aluminum to iron markedly increases its resistance to oxidation. One steel containing 6% aluminum possesses good oxidation resistance up to 2300◦F (1300◦C). See ALUMINUM ALLOYS. Iron-carbon alloys. The principal iron-carbon alloys are wrought iron, cast iron, and steel. Wrought iron of good quality is nearly pure iron; its carbon content seldom exceeds 0.035%. In addition, it contains 0.075–0.15% silicon, 0.10 to less than 0.25% phosphorus, less than 0.02% sulfur, and 0.06–0.10% manganese. Not all of these elements are alloyed with the iron; part of them may be associated with the intermingled slag that is a characteristic of this product. Because of its low carbon content, the properties of wrought iron cannot be altered in any useful way by heat treatment. See WROUGHT IRON. Cast iron may contain 2–4% carbon and varying amounts of silicon, manganese, phosphorus, and sulfur to obtain a wide range of physical and mechanical properties. Alloying elements (silicon, nickel, chromium, molybdenum, copper, titanium, and so on) may be added in amounts varying from a few
tenths to 30% or more. Many of the alloyed cast irons have proprietary compositions. See CAST IRON; CORROSION. Steel is a generic name for a large group of iron alloys that include the plain carbon and alloy steels. The plain carbon steels represent the most important group of engineering materials known. Although any iron-carbon alloy containing less than about 2% carbon can be considered a steel, the American Iron and Steel Institute (AISA) standard carbon steels embrace a range of carbon contents from 0.06% maximum to about 1%. In the early days of the American steel industry, hundreds of steels with different chemical compositions were produced to meet individual demands of purchasers. Many of these steels differed only slightly from each other in chemical composition. Studies were undertaken to provide a simplified list of fewer steels that would still serve the varied needs of fabricators and users of steel products. The Society of Automotive Engineers (SAE) and the AISI both were prominent in this effort, and both periodically publish lists of steels, called standard steels, classified by chemical composition. These lists are published in the SAE Handbook and the AISI’s Steel Products Manuals. The lists are altered periodically to accommodate new steels and to provide for changes in consumer requirements. There are minor differences between some of the steels listed by the AISI and SAE. Only the AISI lists will be considered here. The standard steels represent a large percentage of all steel produced and, although considerably fewer in number, have successfully replaced the large number of specialized compositions formerly used. A numerical system is used to indicate grades of standard steels. Provision also is made to use certain letters of the alphabet to indicate the steel-making
454
Iron alloys TABLE 2. AISI standard steel designations
Type Carbon steels
Constructional alloy steels
Stainless and heat-resisting steels
Series designation∗ 10xx 11xx 12xx 13xx 23xx 25xx 31xx 33xx 40xx 41xx 43xx 46xx 47xx 48xx 50xx 51xx 5xxxx 61xx 86xx 87xx 92xx 93xx 98xx 2xx 3xx 4xx
5xx ∗The
Composition Nonresulfurized carbon steel Resulfurized carbon steel Rephosphorized and resulfurized carbon steel Manganese 1.75% Nickel 3.50% Nickel 5.00% Nickel 1.25%, chromium 0.65% Nickel 3.50%, chromium 1.55% Molybdenum 0.25% Chromium 0.50 or 0.95%, molybdenum 0.12 or 0.20% Nickel 1.80%, chromium 0.50 or 0.80%, molybdenum 0.25% Nickel 1.55 or 1.80%, molybdenum 0.20 or 0.25% Nickel 1.05%, chromium 0.45%, molybdenum 0.20% Nickel 3.50%, molybdenum 0.25% Chromium 0.28 or 0.40% Chromium 0.80, 0.90, 0.95, 1.00 or 1.05% Carbon 1.00%, chromium 0.50, 1.00, or 1.45% Chromium 0.80 or 0.95%, vanadium 0.10 or 0.15% min Nickel 0.55%, chromium 0.50 or 0.65%, molybdenum 0.20% Nickel 0.55%, chromium 0.50%, molybdenum 0.25% Manganese 0.85%, silicon 2.00% Nickel 3.25%, chromium 1.20%, molybdenum 0.12% Nickel 1.00%, chromium 0.80%, molybdenum 0.25% Chromium-nickel-manganese steels; nonhardenable, austenitic, and nonmagnetic Chromium-nickel steels; nonhardenable, austenitic, and nonmagnetic Chromium steels of two classes: one class hardenable, martensitic, and magnetic; the other nonhardenable, ferritic, and magnetic Chromium steels; low chromium heat resisting
x's are replaced by actual numerals in defining a steel grade, as explained in the text.
process, certain special additions, and steels that are tentatively standard, but these are not pertinent to this discussion. Table 2 gives the basic numerals for the AISI classification and the corresponding types of steels. In this system the first digit of the series designation indicates the type to which a steel belongs; thus 1 indicates a carbon steel, 2 indicates a nickel steel, and 3 indicates a nickel-chromium steel. In the case of simple alloy steels, the second numeral usually indicates the percentage of the predominating alloying element. Usually, the last two (or three) digits indicate the average carbon content in points, or hundredths of a percent. Thus, 2340 indicates a nickel steel containing about 3% nickel and 0.40% carbon. All carbon steels contain minor amounts of manganese, silicon, sulfur, phosphorus, and sometimes other elements. At all carbon levels the mechanical properties of carbon steel can be varied to a useful degree by heat treatments that alter its microstructure. Above about 0.25% carbon steel can be hardened by heat treatment. However, most of the carbon steel produced is used without a final heat treatment. See HEAT TREATMENT (METALLURGY). Alloy steels are steels with enhanced properties attributable to the presence of one or more special elements or of larger proportions of manganese or silicon than are present ordinarily in carbon steel. The major classifications of alloy steels are high-strength, low-alloy; AISI alloy; alloy tool; heat-resisting; electrical; and austenitic manganese. Some of these iron
alloys are discussed briefly in this article; for more detailed attention see STAINLESS STEEL; STEEL. Iron-chromium alloys. An important class of ironchromium alloys is exemplified by the wrought stainless and heat-resisting steels of the type 400 series of the AISI standard steels, all of which contain at least 12% chromium, which is about the minimum chromium content that will confer stainlessness. However, considerably less than 12% chromium will improve the oxidation resistance of steel for service up to 1200◦F (650◦C), as is true of AISI types 501 and 502 steels that nominally contain about 5% chromium and 0.5% molybdenum. A comparable group of heat- and corrosion-resistant alloys, generally similar to the 400 series of the AISI steels, is covered by the Alloy Casting Institute specifications for cast steels. Corrosion-resistant cast irons alloyed with chromium contain 12–35% of that element and up to 5% nickel. Cast irons classified as heat-resistant contain 15–35% chromium and up to 5% nickel. During World War I a high-carbon steel used for making permanent magnets contained 1–6% chromium (usually around 3.5%); it was developed to replace the magnet steels containing tungsten that had been formerly but could not then be made because of a shortage of tungsten. See MAGNET. Iron-chromium-nickel alloys. The wrought stainless and heat-resisting steels represented by the type 200 and the type 300 series of the AISI standard steels are an important class of iron-chromium-nickel
Iron alloys alloys. A comparable series of heat- and corrosionresistant alloys is covered by specifications of the Alloy Casting Institute. Heat- and corrosion-resistant cast irons contain 15–35% chromium and up to 5% nickel. Iron-chromium-aluminum alloys. Electrical-resistance heating elements are made of several iron alloys of this type. Nominal compositions are 72% iron, 23% chromium, 5% aluminum; and 55% iron, 37.5% chromium, 7.5% aluminum. The iron-chromium-aluminum alloys (with or without 0.5–2% cobalt) have higher electrical resistivity and lower density than nickel-chromium alloys used for the same purpose. When used as heating elements in furnaces, the iron-chromium-aluminum alloys can be operated at temperatures of 4262◦F (2350◦C) maximum. These alloys are somewhat brittle after elevated temperature use and have a tendency to grow or increase in length while at temperature, so that heating elements made from them should have additional mechanical support. Addition of niobium (columbium) reduces the tendency to grow. Because of its high electrical resistance, the 72% iron, 23% chromium, 5% aluminum alloy (with 0.5% cobalt) can be used for semiprecision resistors in, for example, potentiometers and rheostats. See ELECTRICAL RESISTANCE; RESISTANCE HEATING. Iron-cobalt alloys. Magnetically soft iron alloys containing up to 65% cobalt have higher saturation values than pure iron. The cost of cobalt limits the use of these alloys to some extent. The alloys also are characterized by low electrical resistivity and high hysteresis loss. Alloys containing more than 30% cobalt are brittle unless modified by additional alloying and special processing. Two commercial alloys with high permeability at high field strengths (in the annealed condition) contain 49% cobalt with 2% vanadium, and 35% cobalt with 1% chromium. The latter alloy can be cold-rolled to a strip that is sufficiently ductile to permit punching and shearing. In the annealed state, these alloys can be used in either ac or dc applications. The alloy of 49% cobalt with 2% vanadium has been used in pole tips, magnet yokes, telephone diaphragms, special transformers, and ultrasonic equipment. The alloy of 35% cobalt with 1% chromium has been used in high-flux-density motors and transformers as well as in some of the applications listed for the higher-cobalt alloy. Although seldom used now, two high-carbon alloys called cobalt steel were used formerly for making permanent magnets. These were both highcarbon steels. One contained 17% cobalt, 2.5% chromium, 8.25% tungsten; the other contained 36% cobalt, 5.75% chromium, 3.75% tungsten. These are considered magnetically hard materials as compared to the magnetically soft materials discussed in this article. Iron-manganese alloys. The important commercial alloy in this class is an austenitic manganese steel (sometimes called Hadfield manganese steel after its inventor) that nominally contains 1.2% carbon and 12–13% manganese. This steel is highly resistant to abrasion, impact, and shock.
Iron-nickel alloys. The iron-nickel alloys discussed here exhibit a wide range of properties related to their nickel contents. Nickel content of a group of magnetically soft materials ranges from 40 to 60%; however, the highest saturation value is obtained at about 50%. Alloys with nickel content of 45–60% are characterized by high permeability and low magnetic losses. They are used in such applications as audio transformers, magnetic amplifiers, magnetic shields, coils, relays, contact rectifiers, and choke coils. The properties of the alloys can be altered to meet specific requirements by special processing techniques involving annealing in hydrogen to minimize the effects of impurities, grain-orientation treatments, and so on. Another group of iron-nickel alloys, those containing about 30% nickel, is used for compensating changes that occur in magnetic circuits due to temperature changes. The permeability of the alloys decreases predictably with increasing temperature. Low-expansion alloys are so called because they have low thermal coefficients of linear expansion. Consequently, they are valuable for use as standards of length, surveyors’ rods and tapes, compensating pendulums, balance wheels in timepieces, glass-tometal seals, thermostats, jet-engine parts, electronic devices, and similar applications. The first alloy of this type contained 36% nickel with small amounts of carbon, silicon, and manganese (totaling less than 1%). Subsequently, a 39% nickel alloy with a coefficient of expansion equal to that of low-expansion glasses and a 46% nickel alloy with a coefficient equal to that of platinum were developed. Another important alloy is one containing 42% nickel that can be used to replace platinum as lead-in wire in light bulbs and vacuum tubes by first coating the alloy with copper. An alloy containing 36% nickel and 12% chromium has a constant modulus of elasticity and low expansivity over a broad range of temperatures. Substitution of 5% cobalt for 5% nickel in the 36% nickel alloy decreases its expansivity. Small amounts of other elements affect the coefficient of linear expansion, as do variations in heat treatment, cold-working, and other processing procedures. A 9% nickel steel is useful in cryogenic and similar applications because of good mechanical properties at low temperatures. Two steels (one containing 10–12% nickel, 3–5% chromium, about 3% molybdenum, and lesser amounts of titanium and aluminum and another with 17–19% nickel, 8–9% cobalt, 3– 3.5% molybdenum, and small amounts of titanium and aluminum) have exceptional strength in the heattreated (aged) condition. These are known as maraging steels. Cast irons containing 14–30% nickel and 1.75– 5.5% chromium possess good resistance to heat and corrosion. See THERMAL EXPANSION. Iron-silicon alloys. There are two types of ironsilicon alloys that are commercially important: the magnetically soft materials designated silicon or electrical steel, and the corrosion-resistant, high-silicon cast irons.
455
456
Iron metabolism Most silicon steels used in magnetic circuits contain 0.5–5% silicon. Alloys with these amounts of silicon have high permeability, high electrical resistance, and low hysteresis loss compared with relatively pure iron. Most silicon steel is produced in flat-rolled (sheet) form and is used in transformer cores, stators, and rotors of motors, and so on that are built up in laminated-sheet form to reduce eddycurrent losses. Silicon-steel electrical sheets, as they are called commercially, are made in two general classifications: grain-oriented and nonoriented. The grain-oriented steels are rolled and heattreated in special ways to cause the edges of most of the unit cubes of the metal lattice to align themselves in the preferred direction of optimum magnetic properties. Magnetic cores are designed with the main flux path in the preferred direction, thereby taking advantage of the directional properties. The grain-oriented steels contain about 3.25% silicon, and they are used in the highest-efficiency distribution and power transformers and in large turbine generators. The nonoriented steels may be subdivided into low-, intermediate-, and high-silicon classes. Lowsilicon steels contain about 0.5–1.5% silicon and are used principally in rotors and stators of motors and generators; steels containing about 1% silicon are also used for reactors, relays, and small intermittent-duty transformers. Intermediate-silicon steels contain about 2.5–3.5% silicon and are used in motors and generators of average to high efficiency and in small- to medium-size intermittentduty transformers, reactors, and motors. High-silicon steels contain about 3.75–5% silicon and are used in power transformers and communications equipment in highest efficiency motors, generators, and transformers. High-silicon cast irons containing 14–17% silicon and sometimes up to 3.5% molybdenum possess corrosion resistance that makes them useful for acid-handling equipment and for laboratory drain pipes. Iron-tungsten alloys. Although tungsten is used in several types of relatively complex alloy (including high-speed steels not discussed here), the only commercial alloy made up principally of iron and tungsten was a tungsten steel containing 0.5% chromium in addition to 6% tungsten that was used up to the time of World War I for making permanent magnets. Hard-facing alloys. Hard-facing consists of welding a layer of metal of special composition on a metal surface to impart some special property not possessed by the original surface. The deposited metal may be more resistant to abrasion, corrosion, heat, or erosion than the metal to which it is applied. A considerable number of hard-facing alloys are available commercially. Many of these would not be considered iron alloys by the 50% iron content criterion adopted for the iron alloys in this article, and they will not be discussed here. Among the iron alloys are low-alloy facing materials containing chromium as the chief alloying element, with smaller amounts of
manganese, silicon, molybdenum, vanadium, tungsten, and in some cases nickel to make a total alloy content of up to 12%, with the balance iron. Highalloy ferrous materials containing a total of 12–25% alloying elements form another group of hard-facing alloys; a third group contains 26–50% alloying elements. Chromium, molybdenum, and manganese are the principal alloying elements in the 12–25% group; smaller amounts of molybdenum, vanadium, nickel, and in some cases titanium are present in various proportions. In the 26–50% alloys, chromium (and in some cases tungsten) is the principal alloying element, with manganese, silicon, nickel, molybdenum, vanadium, niobium (columbium), and boron as the elements from which a selection is made to bring the total alloy content within the 26–50% range. Permanent-magnet alloys. These are magnetically hard ferrous alloys, many of which are too complex to fit the simple compositional classification used above for other iron alloys. As already mentioned in discussing iron-cobalt and iron-tungsten alloys, the high-carbon steels (with or without alloying elements) are now little used for permanent magnets. These have been supplanted by a group of sometimes complex alloys with much higher retentivities. The ones considered here are all proprietary compositions. Two of the alloys contain 12% cobalt and 17% molybdenum and 12% cobalt and 20% molybdenum. Members of a group of six related alloys contain iron, nickel, aluminum, and with one exception cobalt; in addition, three of the cobalt-containing alloys contain copper and one has copper and titanium. Unlike magnet steels, these alloys resist demagnetization by shock, vibration, or temperature variations. They are used in magnets for speakers, watt-hour meters, magnetrons, torque motors, panel and switchboard instruments, and so on, where constancy and degree of magnet strength are important. See ALLOY STRUCTURES; MAGNETISM. Harold E. McGannon Bibliography. American Society for Testing and Materials, Annual Book of ASTM Standards, 1992, vol. 01.02: Ferrous Castings; Ferroalloys, 1992; F. E. Fletcher, High-Strength Low-Alloy Steels: Status, Selection, and Physical Metallurgy, 1982; R. A. Oriani et al. (eds.), Hydrogen Dehydration of Ferrous Alloys, 1985; Y. S. Touloukian and C. Y. Ho (eds.), Properties of Selected Ferrous Alloying Elements, vol. 3, 1989.
Iron metabolism A nearly continuous cycle whereby iron from organic and inorganic sources is converted to form iron-porphyrin compounds, which can be utilized by the body. One such compound, termed a heme, is hemoglobin; more than 60% of the iron in the body is used in hemoglobin metabolism. Iron is also essential for other heme compounds, such as myoglobin and cytochromes, and for a wide variety of
Iron metallurgy nonheme enzymes, including many in the citric acid cycle. Hemoglobin metabolism involves formation and breakdown. Formation is initiated by the process of absorption. Unlike most elements, iron has no specific mechanism for excretion so that absorption must be closely monitored to control body content and ensure replacement of the daily loss. The normal daily requirements of iron are 1 mg for males, 2 mg for young women (to cover the loss from menstruation), and 3 mg for pregnant women. From an average diet, an absorption rate of 5–10% is possible. Absorption occurs best from animal and plant hemes and less from inorganic ferric salts; different mechanisms are involved for the two types. Many factors influence absorption, especially gastric acid, which can solubilize iron salts and prevent their precipitation in the duodenum. In the duodenum, iron rapidly enters the mucosal cells of the intestinal villi, where the iron is released from heme by the enzyme heme oxygenase. There, the ferrous iron destined for the formation of hemoglobin in the developing red cells of the marrow (the erythroblasts) is converted to ferric ions by ceruloplasmin and is attached to the transport glycoprotein transferrin. The nonassimilated iron remains in the intestinal cell and is combined with the storage protein apoferritin to form ferritin, which is lost by the body when the mucosal cells are shed after their 3–5-day life cycle. The mechanism by which the mucosal cell knows what to discard and what to assimilate is unknown. Absorption is increased by iron deficiency, anemia of many varieties, hypoxia, and increased red cell production (erythropoiesis). Once ferric iron is attached to transferrin, it circulates in the blood until it attaches to transferrin receptors on immature red blood cells in the marrow. There may be as many as 300,000 receptors on the surface of an individual cell. Once attached, the transferrin receptor complex is taken in by endocytosis, and the iron is released and reduced to ferrous ions. The transferrin receptor complex returns to the cell surface, and the transferrin reenters the blood plasma. The iron enters the mitochondria and is inserted into protoporphyrin by the enzyme ferrochetalase to form heme, which when combined with the protein globin forms the respiratory pigment hemoglobin. Mature red blood cells cannot take up iron; at the end of their 120-day life span they are engulfed by the monocyte-macrophage cells in liver and spleen, where the iron is released by the enzyme heme oxygenase. Sixty percent of this iron is rapidly returned to the marrow to produce red blood cells, while the remainder is stored as ferritin in the labile pool for release as needed. Iron excessively absorbed from the gut or released from the labile pool and not destined for the formation of red blood cells may enter the storage compartment as ferritin or hemosiderin. Apoferritin, the iron-free storage protein, exists in most living cells, and ferric ions can be stored in its hollow sphere to form a growing crystal of ferric oxyhydroxide (FeOOH). Small amounts of apoferritin en-
ter the circulatory system at levels directly paralleling those of the stores. Hemosiderin occurs in the monocyte macrophages of the liver and spleen as FeOOH stripped of its apoferritin shell. Both storage forms may be rapidly released on demand for increased red blood cell production. Excessive amounts of iron may be stored as hemosiderin following multiple transfusions of blood in non-iron-deficient anemias, in the disease hemochromatosis, and following undue breakdown of cells. See BLOOD; HEMOGLOBIN. John Murray Bibliography. W. J. Williams et al. (eds.), Hematology, 5th ed., 1995.
Iron metallurgy Extracting iron from ores and preparing it for use. Extraction involves the conversion of naturally occurring iron-bearing minerals into metallic iron. The term ironmaking is commonly used to include all of the industrial processes that convert raw materials into iron. The major process for the production of iron is the iron blast furnace, which varies widely in size and specific features. However, the growth of alternative direct-reduction processes has been very significant (Fig. 1). The principal difference between the blast furnace process and the directreduction processes is the temperature of operation. In the blast furnace, high operating temperatures enable the production of molten iron. At the lower operating temperatures of the direct-reduction processes, solid or sponge iron is produced. Most of the iron produced in the world is used in the production of steel. The remainder is converted to iron castings, ferroalloys, and iron powder. An alternative method for ironmaking is direct smelting. These processes rely on the direct reaction of noncoking coal, oxygen, and iron ore in a high-temperature smelter to produce molten iron. In this sense, they fall between the blast furnace and direct reduction (Fig. 1). The primary impetus for these processes is the elimination of coking in the production of iron from coal. See FERROALLOY. Ore preparation. The major ore deposits contain iron in the form of oxides, such as hematite (Fe2O3) and magnetite (Fe3O4). These oxides are physically separated from the other constituents in the ore by sequences of operations, including crushing and fine grinding, magnetic and flotation concentration, and dewatering. The separation of iron oxide concentrates in this manner is a branch of a technology called mineral beneficiation or ore dressing. The iron oxide particles in the concentrates are almost as fine in size as talcum powder (−200 mesh). The gangue constituents of the ore are also finely divided, and this has caused major problems in the development of environmentally suitable disposal methods. The iron oxide concentrates must be consolidated into hard pellets, sintered lumps, or briquettes to become suitable charge materials for the iron blast furnace and most of the direct-reduction processes. In general, the ore dressing and consolidation operations
457
458
Iron metallurgy
direct reduction pellets
molten steel
iron ore blast furnace
electric furnace
sinter scrap
limestone
molten steel crushed
lime and flux
open heat furnace
molten steel oxygen coal
molten iron
slag
basic oxygen furnace
coke ovens Fig. 1. The position of ironmaking processes (blast furnace and direct reduction) in the steelmaking sequence. (American Iron and Steel Institute)
are performed near the mining sites. See FLOTATION; ORE DRESSING. Reduction of oxide concentrates. The conversion of iron oxide to metallic iron is accomplished by the application of heat and reducing agents. In sequence, hematite (Fe2O3), magnetite (Fe3O4), and w¨ ustite (FeO) are reduced by carbon monoxide (CO) and hydrogen (H2) as shown in reactions (1)–(3). 1
primarily methane (CH4) with small amounts of other recoverable organic species. After processing, the residual gas is a useful fuel. The hard coke charged at the top of the blast furnace descends slowly to the bosh region (Fig. 2), where it is burned with hot air blasted through the tuyeres. In the presence of excess coke the combustion reaction is given by reaction (5). In most blast furnaces, hydrocarbons
/2 Fe2 O3 + 1/6 CO → 1/3 Fe3 O4 + 1/6 CO2
(1a)
C + 1/2 O2 → CO
1
(1b)
(oil, gas, tar, and so on) are added to the hot blast to provide H2, which reduces the required amount of coke to produce a unit of iron. For example, where methane is added to the hot blast, hydrogen is generated by reaction (6). In addition, small but significant
1
1
1
/2 Fe2 O3 + /6 H2 → /3 Fe3 O4 + /6 H2 O 1
/3 Fe3 O4 + 1/3 CO → FeO + 1/3 CO2 1
/3 Fe3 O4 +1/3 H2 → FeO + 1/3 H2 O
(2a) (2b)
FeO + CO → Fe + CO2
(3a)
FeO + H2 → Fe + H2 O
(3b)
The sum of reactions (1a), (2a), (3a) yields reaction (4), which represents an overall reaction for the 1
/2 Fe2 O3 + 3/2 CO → Fe + 3/2 CO2
(4)
reduction of hematite to metallic iron with carbon monoxide. The reducing agents, CO and H2, are derived from coal, natural gas, oil, wood, and other carbonaceous or organic materials. The manner in which the reducing agents are obtained from these raw materials is complex. For example, the iron blast furnace process uses coke (Fig. 1). When coal is heated in coke ovens, a fuel gas is released, leaving a residue of hard carbonaceous material called coke. The released gas is
CH4 + 1/2 O2 → CO + 2H2
(5)
(6)
amounts of CO and H2 are regenerated from CO2 and H2O at higher levels in the stack by solution loss reactions represented by reactions (7). CO2 + C → 2CO
(7a)
H2 O + C → CO + H2
(7b)
The sources of the CO2 and H2O for these reactions are reactions (1)– (3), which occur simultaneously. As the gases and solids move countercurrently, the descending solid oxides release their oxygen to the ascending gas stream by means of the reduction reactions. The solution loss reactions provide some regeneration of CO and H2 to maintain the reducing strength of the ascending gases. See COKE.
Iron metallurgy rotating bell small bell medium bell movable deflector
large bell
10.7 m (35.3 ft) shaft
19.0 m (62.7 ft) 36.8 m (121 ft)
cooling stave tuyeres, cinder notch tapholes
belly 15.9 m (52.5 ft)
bosh hearth 14.4 m (47.5 ft)
2.5 m (8.3 ft) 4.3 m (14 ft) 6.2 m (21 ft) 4.8 m (16 ft)
Fig. 2. Iron blast furnace. (After J. G. Peacey and W. G. Davenport, The Iron Blast Furnace, Pergamon, 1979)
For the direct-reduction processes, CO and H2 are produced from the available raw materials in a variety of ways. Several of the rotary kiln processes utilize the direct charging of coal with iron oxide concentrates. The fixed carbon and the volatile matter of the coal are partially burned in place with the iron oxides to generate the reducing gases. Several of the shaft, retort, and fluidized-bed processes utilize the separate generation of reducing gas from natural gas. In some direct smelting processes, pulverized coal is injected into a molten iron/slag emulsion to provide reductant (fixed carbon and hydrocarbon gases). Simultaneously, iron oxide fines and pellets are fed into the slag, where they dissolve readily. An oxygen lance provides oxidant to burn hydrocarbons and fixed carbon. The overall reaction system in direct smelting is basically the same as that in the blast furnace; however, it is as if the blast furnace were compressed into just the bosh and hearth regions. The process depends upon the fluid dynamic conditions in the iron/slag emulsion, also referred to as a foam. Material balances. The principal raw materials used in the extraction of iron are oxide concentrates, reducing agents, air, water and steam, auxiliary fuels used directly or converted to electrical energy, and fluxes (Fig. 3). The most common flux used in the iron blast furnace is limestone (CaCO3). Large modern furnaces are extremely efficient and values of coke consumption as low as 0.40 kg of coke per kilogram of iron (800 lb/short ton) have been achieved with oil injection. The material balance provides an indication of the relative magnitudes of the major
input and output items. Direct-reduction and directsmelting processes have features in common with the blast furnace; however, the input and output statements may appear to be different to an experienced observer. The quantity of air required per unit of iron produced in the blast furnace is an impressive figure. At least 1 m3 of air must be blown into the blast furnace or direct-reduction process for each kilogram of iron produced. The exhaust gases from the blast furnace contain considerable amounts of H2 and CO. In modern plants, material balances are monitored by sophisticated computer techniques. It is essential to keep the system operating productively and efficiently. In an integrated steel plant, interruptions in the production of iron from the blast furnace can upset the entire sequence of operations. In this respect, the integrated steel plant is similar to the assembly line for any manufactured product. See PRESSURIZED BLAST FURNACE. Energy balances and temperature profiles. Total energy consumption in steelmaking is much lower than in the production of other common metals (Table 1). To some extent the higher thermal efficiency of ironmaking may be attributed to the large scale of iron and steel operations. Another important factor is that recycling and reuse of waste materials and energy sources have been extensively developed (Table 2). Estimated temperature profiles have been established (Fig. 4). The temperature of the descending solids rises rapidly to about 1000 K (1340◦F) in the top portion of the furnace, rises gradually inputs, kg (lb)
outputs, kg (lb)
pellets sinter ore
1600 (3520)
top gas
flux coke
150 (330) 450 (990)
dust
2300 (5060) 10 (22)
slag 300 (660)
air blast pure O2 hydrocarbon fuel
1300 (2860) 40 (88) 50 (110)
molten metal 1000 (2200) 4.5% C, 0.8% Si, 0.3% Mn, 0.03% S, 0.1% P
Fig. 3. Simplified material balance for an iron blast furnace. (After J. G. Peacey and W. G. Davenport, The Iron Blast Furnace, Pergamon, 1979)
459
460
Iron metallurgy TABLE 1. Approximate energy requirements for the extraction of metals from ores
Metal
Theoretical, 102 kJ/kg
Iron Copper Aluminum Titanium
3.7 1.6 18.3 12.0
Actual, 103 kJ/kg 19.4 54.0 236.0 501.0
TABLE 2. Approximate simplified energy balance for a blast furnace Parameter Energy input Calorific value of coke Sensible heat of hot air blast Calorific value of injected fuel Total Energy output Calorific value of recoverable top gas Energy consumed Heat for reactions, sensible heat of slag and iron, and losses
103 kJ/kg
14.87 1.95 1.12 17.94
7.40
10.54
to about 1200 K (1700◦F) in the central portion, and rises rapidly to temperatures in excess of the fusion point in the lower portion. The radial temperature gradient is very steep at the midheight of the furnace, where the temperature at the vertical axis may be as high as 1900 K (2960◦F), while the wall temperature is only 1200 K (1700◦F). The residence time of the ascending gases in the furnace is of the order of 1 s, while that of the descending solids is of the order of 104 s. Uniformity of gas and solid movement in the furnace is highly desirable. Excessive channeling, erratic flow, and formation of static zones can lead to major losses in efficiency and productivity. Composition of products. Although meteoric iron containing nickel was used before 4000 B.C., it is believed that iron was not extracted from ore until about 3000 B.C. The earliest extracted form and composition of iron was similar to that produced in modern direct-reduction processes that operate at temperatures below the melting point of iron. This reduced solid iron is called sponge iron because of its porosity. Sponge iron contains significant amounts of trapped carbon and unreduced oxide, and it must be worked severely or melted and refined. Blast furnaces produce liquid iron that is commonly called hot metal or pig iron. Pig iron contains more than 4% carbon and significant amounts of manganese, silicon, and phosphorus. Pig iron must be refined in subsequent steelmaking operations. Trends. The largest modern furnaces can produce more than 2.2 × 107 lb (107 kg) of iron per day, operating with careful charge control, high top pressure, high blast temperature, and fuel injection. It has become common practice to lower the quantity of slag and its basicity. Sulfur control is provided through the addition of a separate process for the desulfu-
rization of hot metal. With the development of advanced sensors and mathematical modeling methods and supercomputers, it has become possible to predict and control the operation of large blast furnaces to an extent far greater than was previously the case. In seconds, analyses of variations can be made, and corrective actions can be taken to maintain high productivity and efficiency. See SIMULATION; SUPERCOMPUTER. The modern iron blast furnace is a highly developed, very efficient, and economical process for large steel plants. However, it requires coke, which must be produced in associated coke ovens, a practice associated with environmental problems; and it is very large and inflexible. Once it becomes operational, the blast furnace goes continuously, and the associated steel plant activities must be matched to take its output. Further, the blast furnace offers little opportunity for incremental increases in steel plant capacity. A major expansion and investment is required for installation of a new blast furnace. A typical large modern integrated steel plant operates with one to four blast furnaces. Direct-reduction processes have been constructed at many sites throughout the world, particularly in the developing countries. The direct reduction processes offer the alternative of a shippable iron product suitable for steel production in small- and largescale steel plants. In effect, the direct-reduction iron products act as a substitute for scrap (Fig. 1) with the added advantage that they are cleaner and less contaminated with tramp elements than bundles of commercial scrap. The daily production rates of the direct-reduction processes are less than 2.2 × 106 lb
600 K (620°F)
500 K (440°F) 1000 K (1340°F)
zone of relatively constant temperature
1200 K (1700°F)
1400 K (2060°F) 1700 K (2600°F) 1900 K (2960°F)
slag metal
2400 K (3860°F) 1900 K (2960°F) 1850 K (2870°F) 1800 K (2780°F) 1750 K (2690°F)
Fig. 4. Temperature profiles in a modern blast furnace. (After J. G. Peacey and W. G. Davenport, The Iron Blast Furnace, Pergamon, 1979)
Irrigation (agriculture) (1 × 106 kg) and often are as low as 4.2 × 105 lb (2 × 105 kg). The direct smelting processes are designed to be flexible and compact with high production intensity (tons of iron produced per unit of working reactor volume) and to start up quickly and adjust production rate to demand. They are adaptable to small-scale steel production facilities. However, molten iron is not shippable and must be transferred to nearby steelmaking furnaces or cast into pigs. See CAST IRON; IRON; IRON ALLOYS; PYROMETALLURGY; STEEL MANUFACTURE. George R. St. Pierre Bibliography. American Institute of Mining, Metallurgical, and Petroleum Engineers (AIME), Blast Furnace: Theory and Practice, 1969; AIME, Ironmaking Proceedings, annually; Gmelin Institute for Inorganic Chemistry of the Max Planck Society for the Advancement of Science, Metallurgy of Iron, 4th ed., vols. 1–12, 1992; R. J. Fruehan et al. (eds.), Making, Shaping, and Treating of Steel, 11th ed., 1998; W.-K. Lu, A critical review of blast furnace reactions, Iron Steelmak., pp. 51–62, October 1987; T. P. McAloon, Alternate ironmaking update, Iron Steelmak., pp. 37–56, February 1994; J. G. Peacey and W. G. Davenport, The Iron Blast Furnace, 1979.
Ironwood The name given to any of at least 10 kinds of tree in the United States. Because of uncertainty as to just which tree is indicated, the name ironwood has been abandoned in the checklist of the native and naturalized trees of the United States. Probably the best known of the 10 is the American hornbeam (Carpinus caroliniana). Some of the others are Ostrya virginiana, eastern hophornbeam; Bumelia lycioides, buckthorn bumelia; B. tenax, tough bumelia; Cliftonia monophylla, buckwheat tree; and Cyrilla racemiflora, swamp cyrilla or swamp ironwood. All of these species except Ostrya are restricted to the southeastern United States. Others commonly called ironwood are Eugenia confusa, redberry eugenia of southern Florida and the Florida Keys; Exothea paniculata, butterbough of southern Florida; and Ostrya knowltonii, Knowlton hophornbeam of southwestern United States. Leadwood (Krugiodendron ferreum), a native of southern Florida, has the highest specific gravity of all woods native to the United States and is also known as black ironwood. See FOREST AND FORESTRY; HOPHORNBEAM; HORNBEAM; TREE. Arthur H. Graves; Kenneth P. Davis
ferent ancestral lineages), a proposal followed by other influential works. However, this suggestion has been rejected by more recent workers such as M. Jensen, A. Smith, and T. Sauc`ede and colleagues who showed in phylogenetic analyses that all irregular echinoids are derived from eodiadematid-like forms in the Lower Jurassic. Most Irregularia, and all the extant forms, are exocyclic: the periproct (the area surrounding the anus) plus the anus is displaced in the direction of interambulacrum 5 (Lov´en’s numbering system) to become situated at the edge of or outside the apical system. (The interambulacrum is the area between two ambulacra, which are the radial series of plates along which the tube feet are arranged.) Therefore, these Irregularia exhibit an anterior-posterior axis running down the centers of ambulacrum III and interambulacrum 5, inducing secondary bilateral symmetry. Bilateral symmetry (symmetry along a central axis, with division into equivalent left and right halves) is most pronounced in the spatangoid heart urchins, and in stark contrast to the radial symmetry (symmetry about a central point) of the “regular” urchins. Additional characteristics of the Irregularia include the miniaturization of external appendages [spines, tube feet, pedicellariae (small grasping organs)], the relatively large size of the periproct, the high density of primary tubercles, and in most forms a relatively low test (the internal, limestone skeleton of sea urchins) profile. The Irregularia includes a basal group, the Eodiadematidae, plus holectypoids, pygasteroids, cassiduloid and clypeasteroid neognathostomates, and holasteroid and spatangoid atelostomates. Irregular echinoids left a richer fossil record than the regular urchins, in part because the test of irregulars is less prone to postmortem disintegration and because they tended to live in softer substrata that led to better preservation. Irregular echinoids are largely sediment-swallowing “podial particle pickers” and have developed sophisticated mechanisms for handling fine sediments. See ECHINOIDA; ECHINOIDEA; ECHINODERMATA; HOLECTYPOIDA; NEOGNATHOSTOMATA; PYGASTEROIDA. Rich Mooi Bibliography. J. W. Durham, Phylogeny and evolution, in R. C. Moore (ed.), Part U: Treatise on Invertebrate Paleontology, Echinodermata-EchinozoaEchinoidea, vol. 3, pp. 266–269, 1966; J. W. Durham and R. V. Melville, A classification of echinoids, J. Paleontol., 31:242–272, 1957; M. Jensen, Morphology and classification of the Euechinoidea Bronn, 1860—a cladistic analysis, Videnskabelige Meddelelser fra Dansk Naturhistorisk Forening i Kjobenhavn, 143:7–99, 1981; A. B. Smith, Echinoid Palaeobiology, Allen & Unwin, London, 1984.
Irregularia The name given by P. A. Latreille in 1825 to what is now recognized as a natural (monophyletic, that is, evolved from a common ancestral form) group of echinoids (sea urchins). Previously, workers such as J. Durham and R. Melville suggested that the group was polyphyletic (evolved from dif-
Irrigation (agriculture) The artificial application of water to the soil to produce plant growth. Irrigation also cools the soil and atmosphere, making the environment favorable for plant growth. The use of some form of irrigation is
461
462
Irrigation (agriculture) TABLE 1. Example of consumption of water by various crops, in inches∗ Crop
April
May
June
July
Aug.
Sept.
Oct.
Seasonal total
Alfalfa Beets Cotton Peaches Potatoes
3.3
6.7 1.9 2.0 3.4
5.4 3.3 4.1 6.7 0.7
7.8 5.3 5.8 8.4 3.4
4.2 6.9 8.6 6.4 5.8
5.6 5.8 6.7 3.1 4.4
4.4 1.1 2.7 1.1
37.4 24.3 31.0 30.0 14.0
∗1
1.1 1.0
in. = 25 mm.
well documented throughout the history of civilization. Use of water by plants. Growing plants use water almost continuously. Growth of crops under irrigation is stimulated by optimum moisture, but retarded by excessive or deficient amounts. Factors influencing the rate of water use by plants include the type of plant and stage of growth, temperature, wind velocity, humidity, sunlight duration and intensity, and available water supply. Plants use the least amount of water upon emergence from the soil and near the end of the growing period. Irrigation and other management practices should be coordinated with the various stages of growth. A vast amount of research has been done on the use of water by plants, and results are available for crops under varying conditions. See PLANT GROWTH. Consumptive use. In planning new or rehabilitating old irrigation projects, consumptive use is the most important factor in determining the amount of water required. It is also used to determine water rights. Consumptive use, or evapotranspiration, is defined as water entering plant roots to build plant tissues, water retained by the plant, water transpired by leaves into the atmosphere, and water evaporated from plant leaves and from adjacent soil surfaces. Consumptive use of water by various crops under varying conditions has been determined by soil-moisture studies and computed by other wellestablished methods for many regions of the United States and other countries. Factors which have been shown to influence consumptive use are precipitation, air temperature, humidity, wind movement, the growing season, and latitude, which influences hours of daylight. Table 1 shows how consumptive use varies at one location during the growing season. When consumptive-use data are computed for an extensive area, such as an irrigation project, the results will be given in acre-feet per acre for each month of the growing season and the entire irrigation period. Peak-use months determine system capacity needs. An acre-foot is the amount of water required to cover 1 acre 1 ft deep (approx. 44 ft3 or 1214 m3 of water). Soil, plant, and water relationships. Soil of root-zone depth is the storage reservoir from which plants obtain moisture to sustain growth. Plants take from the soil not only water but dissolved minerals necessary to build plant cells. How often this reservoir must be
filled by irrigation is determined by the storage capacity of the soil, depth of the root zone, water use by the crop, and the amount of depletion allowed before a reduction in yield or quality occurs. Table 2 shows the approximate amounts of water held by soils of various textures. Water enters coarse, sandy soils quite readily, but in heavy-textured soils the entry rate is slower. Compaction and surface conditions also affect the rate of entry. Soil conditions, position of the water table, length of growing season, irrigation frequency, and other factors exert strong influence on root-zone depth. Table 3 shows typical root-zone depths in welldrained, uniform soils under irrigation. The depth of rooting of annual crops increases during the entire growing period, given a favorable, unrestricted root zone. Plants in deep, uniform soils usually consume water more slowly from the lower root-zone area than from the upper. Thus, the upper portion is the first to be exhausted of moisture. For most crops, the entire root zone should be supplied with moisture when needed. Maximum production can usually be obtained with most irrigated crops if not more than 50% of the available water in the root zone is exhausted during the critical stages of growth. Many factors influence this safe-removal percentage, including the type of
TABLE 2. Approximate amounts of water in soils available to plants
Soil texture
Water capacity in inches for each foot of depth (in centimeters for each meter of depth)
Coarse sandy soil Sandy loam Silt loam Heavy clay
0.5–0.75 (4.15–6.25) 1.25–1.75 (10.40–14.60) 1.75–2.50 (14.60–20.85) 1.75–2.0 (14.60–16.65) or more
TABLE 3. Approximate effective root-zone depths for various crops Crop
Root-zone depth, ft (m)
Alfalfa Corn Cotton Potatoes Grasses
6 (1.8) 3 (0.9) 4 (1.2) 2 (0.6) 2 (0.6)
Irrigation (agriculture) crop grown and the rate at which water is being removed. Application of irrigation water should not be delayed until plants signal a need for moisture; wilting in the hot parts of the day may reduce crop yields considerably. Determination of the amount of water in the root zone can be done by laboratory methods, which are slow and costly. However, in modern irrigation practice, soil-moisture-sensing devices are used to make rapid determinations directly with enough accuracy for practical use. These devices, placed in selected field locations, permit an operator to schedule periods of water application for best results. Evaporation pans and weather records can be used to estimate plant-water use. Computerizing these data also helps farmers schedule their irrigations. The irrigation system should be designed to supply sufficient water to care for periods of most rapid evapotranspiration. The rate of evapotranspiration may vary from 0 to 0.4 in. per day (10 mm per day) or more. See PLANT-WATER RELATIONS. Water quality. All natural irrigation waters contain salts, but only occasionally are waters too saline for crop production when used properly. When more salt is applied through water and fertilizer than is removed by leaching, a salt buildup can occur. If the salts are mainly calcium and magnesium, the soils become saline, but if salts predominantly are sodium, a sodic condition is possible. These soils are usually found in arid areas, especially in those areas where drainage is poor. Rainfall in humid areas usually carries salts downward to the ground water and eventually to the sea. Saline soils may reduce yields and can be especially harmful during germination. Some salts are toxic to certain crops, especially when applied by sprinkling and allowed to accumulate on the plants. Salt levels in the soil can be controlled by drainage, by overirrigation, or by maintaining a high moisture level which keeps the salts diluted. See PLANTS OF SALINE ENVIRONMENTS. Sodic soils make tillage and water penetration difficult. Drainage, addition of gypsum or sulfur, and overirrigation usually increase productivity. Ponding or sprinkling can be used to leach salts. Intermittent application is usually better and, when careful soil-moisture management is practiced, only small amounts of excess irrigation are needed to maintain healthy salt levels. Diagnoses of both water and soil are necessary for making management decisions. Commercial laboratories and many state universities test both water and soil, and make recommendations. Methods of application. Water is applied to crops by surface, subsurface, sprinkler, and drip irrigation. Surface irrigation includes furrow and flood methods. Furrow method. This method is used for row crops (Fig. 1). Corrugations or rills are small furrows used on close-growing crops. The flow, carried in furrows, percolates into the soil. Flow to the furrow is usually supplied by siphon tubes, spiles, gated pipe, or
463
Fig. 1. Furrow method of irrigation. Water is supplied by pipes with individual outlets, or by ditches and siphon tubes.
valves from buried pipe. Length of furrows and size of stream depend on slope, soil type, and crop; infiltration and erosion must be considered. Flood method. Controlled flooding is done with border strips, contour or bench borders, and basins. Border strip irrigation is accomplished by advancing a sheet of water down a long, narrow area between low ridges called borders. Moisture enters the soil as the sheet advances. Strips vary from about 20 to 100 ft (6 to 30 m) in width, depending mainly on slope (both down and across), and amount of water available. The border must be well leveled and the grade uniform; best results are obtained on slopes of 0.5% or less. The flood method is sometimes used on steeper slopes, but maldistribution and erosion make it less effective.
Fig. 2. A side-roll sprinkler system which uses the main supply line (often more than 1000 ft or 300 m long) to carry the sprinkler heads and as the axle for wheels.
464
Irrigation (agriculture)
Fig. 3. Center-pivot systems are very popular in new irrigation developments.
Bench-border irrigation is sometimes used on moderately gentle, uniform slopes. The border strips, instead of running down the slope, are constructed across it. Since each strip must be level in width, considerable earth moving may be necessary. Basin irrigation is well adapted to flatlands. It is done by flooding a diked area to a predetermined depth and allowing the water to enter the soil throughout the root zone. Basin irrigation may be utilized for all types of crops, including orchards where soil and topographic conditions permit. Subirrigation. This type of irrigation is accomplished by raising the water table to the root zone of the crop or by carrying moisture to the root zone by perforated underground pipe. Either method requires special soil conditions for successful operation.
controls
well and pump
Fig. 4. Center-pivot irrigation system irrigating corn. A single unit can water as much as 500 acres (200 hectares) or more. (Eastern Iowa Light and Power Cooperative)
Sprinkler systems. A sprinkler system consists of pipelines which carry water under pressure from a pump or elevated source to lateral lines along which sprinkler heads are spaced at appropriate intervals. Laterals are moved from one location to another by hand or tractor, or they are moved automatically. The side-roll wheel system, which utilizes the lateral as an axle (Fig. 2), is very popular as a labor-saving method. The center-pivot sprinkler system (Fig. 3) consists of a lateral carrying the sprinkler heads, and is moved by electrical or hydraulic power in a circular course irrigating an area containing up to 135– 145 acres (546,200–586,700 m2). Extra equipment can be attached in order to irrigate the corners, or solid sets can be used. Solid-set systems are systems with sufficient laterals and sprinklers to irrigate the entire field without being moved. These systems are quite popular for irrigating vegetable crops or other crops requiring light, frequent irrigations and, in orchards, where it is difficult to move the laterals. Sprinkler irrigation has the advantage of being adaptable to soils too porous for other systems. It can be used on land where soil or topographic conditions are unsuitable for surface methods. It can be used on steep slopes and operates efficiently with a small water supply. Drip irrigation. This is a method of providing water to plants almost continuously through small-diameter tubes and emitters. It has the advantage of maintaining high moisture levels at relatively low capital costs. It can be used on very steep, sandy, and rocky areas and can utilize saline waters better than most other systems. Clean water, usually filtered, is necessary to prevent blockage of tubes and emitters. The system has been most popular in orchards and vineyards, but is also used for vegetables, ornamentals, and landscape plantings. Automated systems. Automation is being used with solid-set and continuous-move types of systems, such as the center-pivot (Fig. 4) and lateral-move. Surfaceirrigated systems are automated with check dams, operated by time clocks or volume meters, which open or close to divert water to other areas. Sprinkler systems, pumps, and check dams can all be activated by radio signals or low-voltage wired systems, which, in turn, can be triggered by soil-moisture-sensing devices or water levels in evaporation pans. Automatically operated pumpback systems, consisting of a collecting pond and pump, are being used on surface-irrigated farms to better utilize water and prevent silt-laden waters from returning to natural streams. Multiple uses. With well-designed and -managed irrigation systems, it is possible to apply chemicals and, for short periods of time, to moderate climate. Chemicals which are being used include fertilizers, herbicides, and some fungicides. Effectiveness depends on uniformity of mixing and distribution and on application at the proper times. Chemicals must be registered to be used in this manner. Solid-set systems are frequently used to prevent
Isentropic process frost damage to plants and trees, since, as water freezes, it releases some heat. A continuous supply of water is needed during the protecting period. However, large volumes of water are required, and ice loads may cause limb breakage. Sequencing of sprinklers for cooling ensures bloom delay in the spring and reduces heat damage in the summer. Humid and arid regions. The percentage of increase in irrigated land is greater in humid areas than in arid and semiarid areas, although irrigation programs are often more satisfactory where the farmer does not depend on rainfall for crop growth. Good yields are obtained by well-timed irrigation, maintenance of high fertility, keeping the land well cultivated, and using superior crop varieties. There is little difference in the principles of crop production under irrigation in humid and arid regions. The programming of water application is more difficult in humid areas because natural precipitation cannot be accurately predicted. Most humid areas utilize the sprinkler method. To be successful, any irrigation system in any location must have careful planning with regard to soil conditions, topography, climate, cropping practices, water quality and supply, as well as engineering requirements. See AGRICULTURAL SOIL AND CROP PRACTICES; LAND DRAINAGE (AGRICULTURE); TERRACING (AGRICULTURE); WATER CONSERVATION. Mel A. Hagood Bibliography. R. Cuenca, Irrigation System Design, 1989; D. R. Hay (ed.), Planning Now for Irrigation and Drainage in the 21st Century, 1988; D. Hillel (ed.), Advances in Irrigation Engineering, vol. 4, 1987.
Isentropic flow Compressible flow in which entropy remains constant along streamlines. Generally this implies that entropy is the same everywhere in the flow, in which case the flow is also referred to as homentropic flow. See COMPRESSIBLE FLOW; ENTROPY; ISENTROPIC PROCESS. Because of the second law of thermodynamics, an isentropic flow does not strictly exist. From the definition of entropy, an isentropic flow is both adiabatic and reversible. However, all real flows experience to some extent the irreversible phenomena of friction, thermal conduction, and diffusion. Any nonequilibrium, chemically reacting flow is also irreversible. However, there are a large number of gas dynamic problems with entropy increase negligibly slight, which for the purpose of analysis are assumed to be isentropic. Examples are flow through subsonic and supersonic nozzles, as in wind tunnels and rocket engines; and shock-free flow over a wing, fuselage, or other aerodynamic shape. For these flows, except for the thin boundary-layer region adjacent to the surface where friction and thermal conduction effects can be strong, the outer inviscid flow can be considered isentropic. If shock
waves exist in the flow, the entropy increase across these shocks destroys the assumption of isentropic flow, although the flow along streamlines between shocks may be isentropic. See ADIABATIC PROCESS; BOUNDARY-LAYER FLOW; NOZZLE; SHOCK WAVE; SUBSONIC FLIGHT; THERMODYNAMIC PRINCIPLES; THERMODYNAMIC PROCESSES. The assumption of an isentropic flow greatly simplifies the analysis of a flowfield. In many cases, the simple statement of constant entropy replaces a complicated equation such as the momentum or energy equation. In particular, in an isentropic flow of a calorically perfect gas (constant specific heat), the pressure, density, and temperature of the flow are uniquely related to the local Mach number, as shown in Eqs. (1)–(3), where γ is the ratio of spe p0 γ − 1 2 γ /(γ −1) = 1+ M (1) p 2 γ − 1 2 γ /(γ −1) ρ0 = 1+ M ρ 2
(2)
T0 γ −1 2 =1+ M T 2
(3)
cific heat at constant pressure to specific heat at constant volume; p, ρ, and T are the local static pressure, density, and temperature, respectively; M is the local Mach number; and p0, ρ 0, and T0 are the total pressure, density, and temperature, respectively. (Total conditions are defined as the conditions that would exist at a point in the flow if the velocity at that point were isentropically reduced to zero.) Equation (3), involving T, holds for any adiabatic flow of a calorically perfect gas; such a flow does not have to be isentropic. Equations (1)–(3) are applicable to isentropic flows of any geometric complexity, including three-dimensional flows. In this manner, isentropic flows are much easier to compute than nonisentropic. See FLUID-FLOW PRINCIPLES; GAS John D. Anderson, Jr. DYNAMICS. Bibliography. J. D. Anderson, Jr., Fundamentals of Aerodynamics, 3d ed., 2001; J. D. Anderson, Jr., Introduction to Flight, 5th ed., 2005; J. D. Anderson, Jr., Modern Compressible Flow: With Historical Perspective, 3d ed., 2003.
Isentropic process In thermodynamics a change that is accomplished without any increase or decrease of entropy is referred to as isentropic. Since the entropy always increases in a spontaneous process, one must consider reversible or quasistatic processes. During a reversible process the quantity of heat transferred, dQ, is directly proportional to the system’s entropy change, dS, as in Eq. (1), where T is the absolute temdQ = T dS
(1)
perature of the system. Systems which are thermally
465
466
Isentropic surfaces insulated from their surroundings undergo processes without any heat transfer; such processes are called adiabatic. Thus during an isentropic process there are no dissipative effects and, from Eq. (1), the system neither absorbs nor gives off heat. For this reason the isentropic process is sometimes called the reversible adiabatic process. See ADIABATIC PROCESS; ENTROPY; THERMODYNAMIC PROCESSES. Work done during an isentropic process is produced at the expense of the amount of internal energy stored in the nonflow or closed system. Thus, the useful expansion of a gas is accompanied by a marked decrease in temperature, tangibly demonstrating the decrease of internal energy stored in the system. For ideal gases the isentropic process can be expressed by Eq. (2), where P is the pressure, V is γ
γ
P1 V1 = P2 V2 = constant
(2)
the volume, and γ is the ratio between the specific heat at constant pressure and the specific heat at constant volume for the given gas. It can be closely approximated by the values of 1.67 and 1.40 for dilute monatomic and diatomic gases, respectively. For a comparison of various processes involving a gas see POLYTROPIC PROCESS. Philip E. Bloomfield
Isentropic surfaces Surfaces along which the entropy and potential temperature of air are constant. Potential temperature, in meteorological usage, is defined by the relationship given below,
1000 =T P
(cp −cv )/cp
in which T is the air temperature, P is atmospheric pressure expressed in millibars, Cp is the heat capacity of air at constant pressure and Cv is the heat capacity at constant volume. Since the potential temperature of an air parcel does not change if the processes acting on it are adiabatic (no exchange of heat between the parcel and its environment), a surface of constant potential temperature is also a surface of constant entropy. The slope of isentropic surfaces in the atmosphere is of the order of 1/100 to 1/1000 . An advantage of representing meteorological conditions on isentropic surfaces is that there is usually little air motion through such surfaces, since thermodynamic processes in the atmosphere are approximately adiabatic. See ADIABATIC PROCESS; ATMOSPHERIC GENERAL CIRCULATION. Frederick Sanders Bibliography. J. R. Holton, An Introduction to Dynamic Meteorology, 4th ed., 2004; J. C. McWilliams, Fundamentals of Geophysical Fluid Dynamics, 2006; T. D. Potter and B. R. Colman (eds.), Handbook of Weather, Climate and Water: Dynamics, Climate, Physical Meteorology, Weather Systems, and Measurements, 2003.
Ising model A model which consists of a lattice of “spin” variables with two characteristic properties: (1) each of the spin variables independently takes on either the value +1 or the value −1; and (2) only pairs of nearest neighboring spins can interact. The study of this model (introduced by Ernst Ising in 1925) in two dimensions, where many exact calculations have been carried out explicitly, forms the basis of the modern theory of phase transitions and, more generally, of cooperative phenomena. The two-dimensional Ising model was shown to have a phase transition by R. E. Peierls in 1936, and the critical temperature or Curie temperature, that is, the temperature at which this phase transition takes place, was calculated by H. A. Kramers and G. H. Wannier and by E. W. Montroll in 1941. Major breakthroughs were accomplished by Lars Onsager in 1944, by Bruria Kaufman and Onsager in 1949, and by Chen Ning Yang in 1952. Onsager first obtained the free energy and showed that the specific heat diverges as −ln|1 − T/Te| when the temperature T is near the critical temperature Tc; Kaufman and Onsager computed the short-range order; and Yang calculated the spontaneous magnetization. Since then several other properties have been obtained, and since 1974 connections with relativistic quantum field theory have been made. See QUANTUM FIELD THEORY. Cooperative phenomena. A macroscopic piece of material consists of a large number of atoms, the number being of the order of Avogadro’s number (approximately 6 × 1023). Thermodynamic phenomena all depend on the participation of such a large number of atoms. Even though the fundamental interaction between atoms is short-ranged, the presence of this large number of atoms can, under suitable conditions, lead to an effective interaction between widely separated atoms. Phenomena due to such effective long-range interactions are referred to as cooperative phenomena. The simplest examples of cooperative phenomena are phase transitions. The most familiar phase transition is either the condensation of steam into water or the freezing of water into ice. Only slightly less familiar is the ferromagnetic phase transition that takes place at the Curie temperature, which, for example, is roughly 1043 K for iron. Of the several models which exhibit a phase transition, the Ising model is the best known. In three dimensions the model is so complicated that no exact computation has ever been made, while in one dimension the Ising model does not undergo a phase transition. However, in two dimensions the Ising model not only has a ferromagnetic phase transition but also has very many physical properties which may be exactly computed. Indeed, despite the restriction on dimensionality, the two-dimensional Ising model exhibits all of the phenomena peculiar to magnetic systems near the Curie temperature. See CURIE TEMPERATURE; FERROMAGNETISM; MAGNETISM.
Ising model
j
k
+ E2 ( j, k)σ j,k σ j+1,k + Hσ j,k ] (1) specifies the row and k specifies the column of the lattice. In this form the interaction energies E1(j,k) and E2(j,k) are allowed to vary arbitrarily throughout the lattice. A special case of great importance is the translationality invariant case (E1 and E2 independent of j and k) which was studied by Onsager in 1944. This is the model needed to study a pure ferromagnet without impurities. Several generalizations of Ising’s original model have been considered. For example, σ can be allowed to take on more values than just ±, and interactions other than nearest neighbor can be used. For these generalizations no exact calculations have been performed in two or three dimensions. However, various approximate calculations indicate that the phase transition properties of these models are the same as those of the Onsager lattice. The extension to the nontranslationally invariant case where E1(j, k) and E2(j, k) are treated as independent random variables is important for studying the effects of impurities in ferromagnetics. Thermodynamic properties. The basic simplification in framing the definition of the Ising model of the preceding section is the choosing of the fundamental variables to be the numbers σ j,k which can be only +1 or −1. Because of this choice there can be no terms in the interaction energy which refer to kinetic energy or to angular momentum. Consequently, the σ j,k do not change with time, and study of the system is, by necessity, confined to those physical properties which depend only on the distribution of energy levels of the system. When the number of energy levels is large, this study requires the use of statistical mechanics. Statistical mechanics allows the calculation of average macroscopic properties from the microscopic interaction E. If A is some property of the spins σ of the system, then the thermodynamic average of A is given by Eq. (2), where T is the temperature, k is Boltzmann’s constant, Z is given by Eq. (3), 1 −E(σ )/kT A lim Ae (2) N→∞ Z {σ } Z=
− E(σ )/kT
e
N is the number of rows and the number of columns. It is mandatory that the thermodynamic limit N→∞ be taken for these thermodynamic averages to have a precise meaning. See BOLTZMANN CONSTANT. The most important thermodynamic properties of a ferromagnet are the internal energy per site u = E/N2 the specific heat c = ∂u/∂T, the magnetization per site M = σ and the magnetic susceptibility χ = ∂M∂H. These quantities have been computed for the two-dimensional Ising model E1 = E2, but for convenience this discussion is restricted to E1 = E2 = E. Onsager studied the two-dimensional square lattice at H = O and computed the specific heat exactly. From that calculation he found that the specific heat was infinite at the critical temperature of Kramers and Wannier given as the solution of Eq. (4). When sinh
the sums are over all values of σ j,k = ±1, and
(4)
T is close to the critical temperature Tc, the specific heat is approximated by Eq. (5). The behavior of 8E 2 1 − T c∼− ln (5) 2 kTc π Tc the specific heat for any temperature is plotted in Fig. 1. The spontaneous magnetization isdefined as M(0) = lim M(H) H→0+
For T > Tc, M(0) = 0. For T< Tc, Yang found that M(0) is given by Eq. (6). When T is near Tc, M(0) is approximated by Eq. (7). The behavior M(0) as a function of T is plotted in Fig. 2. 1/8 2E −4 (6) M(0) = 1 − sinh kT √ 1/8 T 8 2 1− M(0) ∼ kTc Tc
(7)
The magnetic susceptibility χ at H = 0 is much 1.5
1.0
0.5
0
(3)
2E =1 kT
c/k
Definition of model. The mutual interaction energy of the pair of spins σ α and σ α when α and α are nearest neighbors may be written as −E(α,α ) · σ α. The meaning of this is that the interaction energy is −E(α,α ) when σ α and σ α are both +1 or −1, and is +E(α,α ) when σ α = 1, σ α = −1, or σ α = −1, σ α = 1. In addition, a spin may interact with an external magnetic field H with energy −Hσ α. From these two basic interactions the total interaction energy for the square lattice may be written as Eq. (1), where j E=− [E1 ( j, k)σ j,k σ j,k+1
1.0
2.0
3.0
kT/E Fig. 1. Specific heat of Onsager’s lattice for E2/E1 = 1.
4.0
467
Ising model
Tc −7/4 χ(T ) ∼ C0− 1 − T Tc −3/4 + C1− 1 − + C2 T
30 29 28 27 26 25 specific heat, J/mole deg
more difficult to compute than either the specific heat or the spontaneous magnetization. Indeed, no closed form expression is known for χ over the entire range of temperature. However, near Tc it is known that as T→ Tc+, χ is approximated by Eq. (8), and, as T→ Tc−, is approximated by Eq. (9). Tc −7/4 χ(T ) ∼ C0+ 1 − T Tc −3/4 + + C1 1 − + C2 (8) T
24 23 22 21 20 19 18 17
(9)
16 15
where
C0+ C0− C1+ C1−
= = = =
0.96258 17322 · · · 0.02553 69719 · · · 0.07498 81538 · · · −0.0198 94107 · · ·
11
.6
.4
.2
.4
.6 T/ Tc
theoretical specific heat of impure lsing model
12
.8
.2
observed specific heat of EuS
13
1.0
0
Key:
14
See INTERNAL ENERGY; MAGNETIC SUSCEPTIBILITY; MAGNETIZATION; SPECIFIC HEAT OF SOLIDS; STATISTICAL MECHANICS; THERMODYNAMIC PRINCIPLES. Random impurities. A question which can be very usefully studied in the Ising model is the generalization of statistical mechanics to deal with the experimental situation in which the interaction energy of the system is not completely known because of the presence of impurities. The term “impurity” refers not only to the presence of foreign material in a sample but to any physical property such as defects or isotopic composition which makes lattice sites different from one another. The distribution of these impurities is governed by spin-independent forces. At least two different situations can be distinguished. (1) As the temperature changes, the distribution of impurities may change; such a situation will occur,
M(O)
468
.8
Fig. 2. Spontaneous magnetization M(0) of Onsager’s lattice for E2/E1 = 1.
1.0
10−4
10−3 (T/ Tc ) −1
10−2
Fig. 3. Comparison of the impure Ising model specific heat with the observed specific heat of EuS for T >Tc.
for example, near the melting point of a lattice. (2) The distribution of impurities may be independent of temperature, at least on the time scale of laboratory measurements; such a distribution will obtain when the temperature of a lattice is well below the melting temperature. Impurities of this sort are said to be frozen in. See CRYSTAL DEFECTS. For a study of frozen-in impurities to be realistic, the impurities must be distributed at random throughout the lattice. The translational invariance of the system is now totally destroyed, and it is not at all clear that the phase transition behavior of the pure and random system should be related to each other at all. These problems were studied for a special case of the Ising model by McCoy and Wu in 1968. They let E2(j, k) depend on j but not k and kept E1(j, k) independent of both j and k. Then the variables E2(j) were chosen with a probability distribution P(E2). When P(E2) was of narrow width, they showed that logarithmic divergence of Onsager’s specific heat is smoothed out into an infinitely differentiable essential singularity. Such a smoothing out of sharp phase transition behavior may in fact have been observed. The results of one such experiment, carried out by B. J. C. van der Hoeven and colleagues, are compared with the results of the Ising model random specific heat calculation in Fig. 3. Barry M. McCoy; Tai Tsun Wu Bibliography. C. Domb and M. Green, Phase Transitions and Critical Phenomena, 1972; R. Liebmann, Statistical Mechanics of Periodic Frustrated Ising Systems, 1986; B. McCoy and T. T. Wu, The Two Dimensional Ising Model, 1973.
Island biogeography
Island biogeography
I
E
rate
The distribution of plants and animals on islands. Islands harbor the greatest number of endemic species. The relative isolation of many islands has allowed populations to evolve in the absence of competitors and predators, leading to the evolution of unique species that can differ dramatically from their mainland ancestors. Dispersal of species. Plant species produce seeds, spores, and fruits that are carried by wind or water currents, or by the feet, feathers, and digestive tracts of birds and other animals. The dispersal of animal species is more improbable, but animals can also be carried long distances by wind and water currents, or rafted on vegetation and oceanic debris. Long-distance dispersal acts as a selective filter that determines the initial composition of an island community. For example, land snails of many tropical Pacific islands are dominated by small-bodied species that have probably been carried by wind dispersal. Many species of continental origin may never reach islands unless humans accidentally or deliberately introduce them. Consequently, although islands harbor the greatest number of unique species, the density of species on islands (number of species per area) is typically lower than the density of species in mainland areas of comparable habitat. See POPULATION DISPERSAL. Unique morphological traits. Once a species reaches an island and establishes a viable population, it may undergo evolutionary change because of genetic drift, climatic differences between the mainland and the island, or the absence of predators and competitors from the mainland. Consequently, body size, coloration, and morphology of island species often evolve rapidly, producing forms unlike any related species elsewhere. Examples include the giant land tortoises of the Gal´apagos, and the Komodo dragon, a species of monitor lizard from Indonesia. See POLYMORPHISM (GENETICS); POPULATION GENETICS; SQUAMATA. If enough morphological change occurs, the island population becomes reproductively isolated from its mainland ancestor, and it is recognized as a unique species. Because long-distance dispersal is relatively infrequent, repeated speciation may occur as populations of the same species successively colonize an island and differentiate. The most celebrated example is Darwin’s finches, a group of related species that inhabit the Gal´apagos Islands and were derived from South American ancestors. The island species have evolved different body and bill sizes, and in some cases occupy unique ecological niches that are normally filled by mainland bird species such as warblers and woodpeckers. The morphology of these finches was first studied by Charles Darwin and constituted important evidence for his theory of natural selection. See ANIMAL EVOLUTION; SPECIATION. Theory. For most groups of organisms, the number of species increases in a nonlinear fashion with island area. Mathematically, the relationship can be described by a power function, S = kAz, where S is
Sequilibrium
P
Equilibrium number of species (Sequilibrium) is determined by the intersection of the immigration ( I ) and extinction ( E ) rate curves. P is the maximum number of species in the source pool.
the number of species on the island, A is the island area, and k and z are fitted constants. On a double logarithmic plot of species versus area, the relationship is linear, with the constant z describing the slope of the line. Although the slope of the species-area curve is steeper for isolated archipelagoes, the number of species per unit area tends to decrease with increasing isolation. Although descriptions of the effects of area and isolation on species richness began about a century ago, only since the late 1960s have these observations been incorporated into a quantitative model. The MacArthur-Wilson equilibrium theory of island biogeography proposes that the number of species on an island represents an ecological balance between ongoing colonization and extinction of populations (see illus.). The model assumes that an island receives its colonists from a permanent source pool (P) of mainland species. The colonization rate (number of new species per unit time) is at a maximum value (I) when the island is empty, and the extinction rate (number of established island populations going extinct per unit time) is at a maximum value (E) when all P species are present on the island. The equilibrium number of species is determined by the intersection of the colonization and extinction curves, which occurs at the point Sequilibrium = PI/(I + E). The model assumes that population size is proportional to island area, and that the risk of population extinction increases at small population size. Consequently, the extinction curve for small islands is steeper than for large islands, producing a trend toward more species at equilibrium as island size increases. The model also assumes that the immigration curve is shallower for more isolated islands, reducing the equilibrium number of species. The key prediction of the model is that island populations are transient and that colonizations and extinctions are frequent. The most successful test of the equilibrium theory was carried out on tiny mangrove islands in the Florida Keys that were experimentally defaunated and then recolonized by insects over a one-year period. The experiment confirmed
469
470
Isoantigen many predictions of the equilibrium model, although much of the turnover may have been from transient species that had not established viable populations. Nature reserves. The equilibrium theory of island biogeography has been used as a tool for guiding conservation decisions. Its principal prescription is that large, contiguous areas will conserve the maximum number of species. However, specific application of the model has been problematic because many plant and animal communities do not exhibit the turnover predicted by the model. Moreover, the model predicts only the number of species, not their identities. Finally, the model assumes an intact mainland fauna, which may no longer exist in fragmented landscapes. See LANDSCAPE ECOLOGY; ZOOGEOGRAPHY. Advances. Island biogeography theory has recently been extended to describe the persistence of single-species metapopulations. A metapopulation is a set of connected local populations in a fragmented landscape that does not include a persistent source pool region. Instead, the fragments themselves serve as stepping stones for local colonization and extinction. The most successful application of the metapopulation model has been to spotted owl populations of old-growth forest fragments in the northwestern United States. The model prescribes the minimum number, area, and spatial arrangement of forest fragments that are necessary to ensure the long-term persistence of the spotted owl metapopulation. See BIOGEOGRAPHY; ECOLOGICAL COMMUNITIES; ECOSYSTEM. Nicholas J. Gottelli Bibliography. T. J. Case and M. L. Cody, Testing theories of island biogeography, Amer. Sci., 75:402–411, 1987; R. H. MacArthur and E. O. Wilson, The Theory of Island Biogeography, Princeton University Press, Princeton, NJ, 1969; K. A. McGuinness, Equations and explanations in the study of species-area curves, Biol. Rev., 59:423–440, 1984; D. Quammen, The Song of the Dodo: Island Biogeography in an Age of Extinctions, Scribner, New York, 1996; M. Williamson, Island populations, Oxford University Press, Oxford, 1981.
Isoantigen An immunologically active protein or polysaccharide present in some but not all individuals in a particular species. These substances initiate the formation of antibodies when introduced into other individuals of the species that genetically lack the isoantigen. Like all antigens, they are also active in stimulating antibody production in heterologous species. The ABO, MN, and Rh blood factors in humans constitute important examples; thus, elaborate precautions for typing are required in blood transfusions. See BLOOD GROUPS. Analogous situations exist for the bloods of most other animal species. Isoantigens are also believed responsible for the ultimate failure of tissue grafts between individuals of the same species, except that of the same genetic constitution or those
that have been rendered tolerant. See ACQUIRED TOLERANCE; TRANSPLANTATION
IMMUNOLOGICAL BIOLOGY.
Isoantigens are to be distinguished from autoantigens, which are antigens active even in the species from which they are derived and in individuals who already possess the antigen. Brain and lens tissue, as well as sperm, constitute examples. These exceptions to the usual rule of nonantigenicity for selfconstituents may be more apparent than real, however, since the substances cited are all protected to some degree from contact with the blood, and thus normally do not reach the sites of antibody formation except after experimental manipulation. Autoantibodies may also be produced in various disease states, perhaps as a result of modification of normal host tissue by the infecting microorganism or by altered host metabolism. Examples are the paroxysmal hemoglobinuria observed in syphilis, acquired hemolytic anemia, or some of the manifestations in rheumatic fever. See ANTIBODY; ANTIGEN; AUTOIMMUNITY; POLYSACCHARIDE; PROTEIN. Margaret J. Polley Bibliography. E. L. Cooper et al. (eds.), Developmental and Comparative Immunology, 1987; D. Male et al., Advanced Immunology, 1987; S. B. Mizel and P. Jaret, The Human Immune System: The New Frontier in Medicine, 1986; P. N. Plowman, Hematology and Immunology, 1987.
Isobar (meteorology) A curve along which pressure is constant. Leading examples of its uses are in weather forecasting and meteorology. The most common weather maps are charts of weather conditions at the Earth’s surface and mean sea level, and they contain isobars as principal information. Areas of bad or unsettled weather are readily defined by roughly circular isobars around low-pressure centers at mean sea level. Likewise, closed isobars around high-pressure centers define areas of generally fair weather (Fig. 1). See AIR PRESSURE. Geostrophic wind. A principal use of isobars stems from the so-called geostrophic wind, which approximates the actual wind on a large scale. The direction of the geostrophic wind is parallel to the isobars, in the sense that if an observer stands facing away from the wind, higher pressures are to the person’s right if in the Northern Hemisphere and to the left if in the Southern. Thus, in the Northern Hemisphere, flow is counterclockwise about low-pressure centers and clockwise about high-pressure centers, with the direction of the flow reversed in the Southern Hemisphere. The speed of the geostrophic wind is inversely proportional to distance between isobars drawn at regular close intervals. More precisely, the geostrophic wind is as in Eq. (1). V =
1 ∂p fρ ∂n
(1)
Isobar (meteorology) 140°
135°
130°
125° 120° 115° 110° 105° 100° 95°
1024 1032 1032 1020 1028 32° H 0°
1012 1016
100
8 50° 100 4 100 0 996 992 988 45°
1012 1020 1028 1016 1024
90°
85°
80°
1012
75° 10
65°
70° 16
10
102
24
0
60°
471 55°
10
28
1028 1024 45° 102 0 10 16 10 12
1008 1004 1000 996
992
L 984
40°
88 L9 L
988 92 40° 9 9 9 6 0 100
1032
35° 1036
2
1004 1008
101
H
H 1028
35° 1012
16
30° 10
1016
30° 25° 1016
25°
1016 1020
areas of rain and snow 120°
115°
1024 1028 110°
105°
100°
1028
1024 1020 95°
85°
90°
Fig. 1. Weather map at mean sea level for 7:00 A.M., Eastern Standard Time, January 14, 1979. Isobars are labeled in millibars. 1 mb = 102 Pa. (After U.S. Department of Commerce, National Oceanic and Atmospheric Administration, Environmental Data and Information Service, Daily Weather Maps, Weekly Series, January 8–14, 1979)
V = geostrophic wind speed f = 2ω sin φ, and is called the Coriolis parameter ω = 7.9292 × 10−5 radian s−1 , the angular velocity of the Earth φ = latitude, positive in the Northern Hemisphere, negative in the Southern ρ = density of the air, varying about a value of of 1.2 × 10−3 g cm−3 at mean sea level p = pressure, varying about a value of 1013 millibars (101.3 kPa) at mean sea level n = distance normal to isobar and toward higher pressure
Note that f is positive in the Northern Hemisphere and negative in the Southern. The geostrophic wind law written this way, then, provides a single rule for both Northern and Southern hemispheres, if it is understood that a change in sign of V implies reversal in direction.
When the geostrophic wind is calculated from isobars on a weather map, Eq. (1) is approximated by Eq. (2) [Fig. 2], where p is the interval of pressure V =
1 p fρ d
(2)
at which the isobars are drawn, and d is the distance between two adjacent isobars. Thus, if isobars are drawn at 4-millibar intervals, then p = 4000 g cm−1 s−2; and if d = 250 km = 2.5 × 107 cm, f = 10−4 radian s−1 (φ = 43.3◦N), and ρ = 1.2 × 10−3 g cm−3, then 4000 cm s−1 × 1.2 × 10−3 × 2.5 × 107 = 1333 cm s−1 = 13 m s−1 = 25 knots
V =
10−4
Although the geostrophic wind is useful everywhere as an approximation to the wind (except in low latitudes where f approaches zero), at mean sea level it is only a very rough approximation. The departure of the actual wind at mean sea level from
80°
472
Isobar (nuclear physics) isobars exceeds the number of stable ones. See ELECHenry E. Duckworth
TRON CAPTURE; RADIOACTIVITY.
V
1012 mb
d
Isobaric process
1016 mb
Fig. 2. Schematic of two isobars illustrating the elements in Eq. (2).
the geostrophic is due principally to friction and turbulence, which are highly variable; and also to the impossibility of precisely reducing pressure to mean sea level from plateaus such as the Rocky Mountains and Himalayas. Thus, although the speed of the geostrophic wind varies with air density, at mean sea level it is hardly worthwhile to take into account the variations of air density. A constant, such as ρ = 1.2 × 10−3 g cm−3, is often used instead. In the free atmosphere, however, where modern aircraft fly, the geostrophic wind is quite accurate as an approximation to the actual wind. Above 1 km (0.6 mi) or so, turbulence and frictional forces are very small, and at even greater heights, above Earth’s plateaus, the pressure reduction problem vanishes. In the upper atmosphere, therefore, it is worthwhile to take into account the 15% or so variation of density in the horizontal. See GEOSTROPHIC WIND. Isobaric surfaces. In practice, density variations in the free atmosphere are rather neatly taken into account by constructing weather charts on isobaric surfaces (surfaces of constant pressure), rather than on level surfaces (surfaces of constant height). Isobars are meaningless on isobaric surfaces, since pressure is constant along any arbitrary curve on such a surface. Instead, curves of constant height are used. These are called isohyets or, more commonly, height contours. The geostrophic wind on isobaric surfaces is as in Eq. (3), where z is height, g is force g ∂z f ∂n
A thermodynamic process during which the pressure remains constant. When heat is transferred to or from a gaseous system, a volume change occurs at constant pressure. This thermodynamic process can be illustrated by the expansion of a substance when it is heated. The system is then capable of doing an amount of work on its surroundings. The maximum work is done when the external pressure Pext of the surroundings on the system is equal to P, the pressure of the system. If V is the volume of the system, the work performed as the system moves from state 1 to 2 during an isobaric thermodynamic process, W12, is the maximum work as given by Eq. (1). For 2 2 W12 = P dV = P dV = P(V2 − V1 ) (1) 1
1
an ideal gas the volume increase is a result of a corresponding temperature increase, so that Eq. (1) yields Eq. (2), where n is the number of moles of gas and W12 = nR(T2 − T1 )
(2)
R is the gas constant. Then W12 represents the work done by the system and is positive if V2 > V1 (that is, T2 > T1 for an ideal gas). See HEAT TRANSFER. By the first law of thermodynamics, the change of the internal energy, U, in any process is equal to the difference between the heat gained, Qp, and the work done by the system, W12; thus, Eq. (3) holds. From Eqs. (1) and (3), it follows that Eq. (4) holds for isobaric processes. U2 − U1 = U12 = Qp − W12
(3)
Qp = (U2 + PV2 ) − (U1 + PV1 )
(4)
of gravity, and n is distance normal to the contour in the direction of increasing height. See WEATHER MAP. Frederick G. Shuman
For isobaric processes, it is useful to introduce the enthalpy, H, which is given by the sum of the internal energy U and PV. Then Eq. (5) can be formulated to 2 Qp = H2 − H1 = Cp dT (5)
Isobar (nuclear physics)
represent Qp, the transferred heat at constant pressure, where Cp = (dQ/dT)|p is the heat capacity at constant pressure. For an ideal gas, Eq. (6) holds.
V =
(3)
One of two or more atoms which have a common mass number A but which differ in atomic number Z. Thus, although isobars possess approximately equal masses, they differ in chemical properties; they are atoms of different elements. Isobars whose atomic numbers differ by unity cannot both be stable; one will inevitably decay into the other by β − emission (Z→Z + 1), β + emission (Z → Z – 1), or electron capture (Z →Z − 1). There are many examples of stable isobaric pairs, such as 50Ti (Z = 24) and 50Cr (Z = 26), and four examples of stable isobaric triplets. At most values of A the number of known radioactive
1
Qp = Cp (T2 − T1 )
(6)
See ENTHALPY. If the isobaric process is also a reversible process, Eq. (7) is obtained, where S is the entropy. 2 T dS (7) Qp = 1
See ENTROPY. For an ideal gas the internal energy change can be expressed in terms of the heat capacity at constant
Isocyanate volume, as in Eq. (8). The combination of Eqs. (2), (3), (6), and (8) yields Eq. (9). U12 = CV (T2 − T1 )
(8)
Cp = CV + nR
(9)
From Eq. (8) it follows that, for an isometric process (fixed volume), heat goes into internal energy only. However, from Eqs. (9) and (3) it follows that, for an isobaric process, Cp > CV, and the heat input is converted into internal energy increase as well as work output by the system. See ADIABATIC PROCESS; ISOMETRIC PROCESS; THERMODYNAMIC PROCESSES. For a comparison of the isobaric process with other processes involving a gas See POLYTROPIC PROCESS. Philip E. Bloomfield Bibliography. D. Halliday, R. Resnick, and J. Walker, Fundamentals of Physics, 6th ed., 2000; J. R. Howell and R. O. Buckius, Fundamentals of Engineering Thermodynamics, 2d ed., 1992; G. J. Van Wylen, R. E. Sonntag, and C. Borgnakke, Fundamentals of Classical Thermodynamics, 4th ed., 1994; M. W. Zemansky, M. M. Abbott, and H. C. Van Ness, Basic Engineering Thermodynamics, 2d ed., 1975.
Isobryales An order of the true mosses (subclass Bryidae). This order is difficult to define precisely, but includes plants that generally grow from a creeping primary stem with leaves reduced or essentially lacking and plants that have spreading to ascending secondary stems which may be pinnately branched (see illus.). Paraphyllia and pseudoparaphyllia are sometimes present on the stems. The leaves may have single or double and sometimes short costae. The cells may be short or elongate and smooth or papillose, with those at basal angles sometimes differentiated. The sporophytes are lateral, usually with
apex
teeth
trellis
perichaetium
apical cells (c) (d)
base (a)
(b)
Fontinalis duriaei, an example of the order Isobryales. (a) Leaf. (b) Apex of leaf. (c) Trellis and teeth. (d) Alar cells. (After W. H. Welch, Mosses of Indiana, Indiana Department of Conservation, 1957)
elongate setae and capsules. The double peristome, sometimes reduced, consists of 16 teeth which are papillose on the outer surface, or less often crossstriate at the base, and an endostome with narrow segments and a low basal membrane or none at all. The calyptrae are cucullate and naked, or mitrate and hairy The order is surely heterogeneous in composition. It consists of about 19 families and 124 genera, some of which may be better placed in the Hypnales. See BRYIDAE; BRYOPHYTA; BRYOPSIDA; HYPNALES. Howard Crum
Isocyanate A derivative of isocyanic acid. Isocyanates are represented by the general formula N C O, where R is predominantly alkyl or aryl; however, stable isocyanates in which the N C O group is linked to elements such as sulfur, silicon, phosphorus, nitrogen, or the halogens have also been prepared. Most members of this class of compounds are liquids that are sensitive to hydrolysis and are strong lacrimators. Isocyanates are extremely reactive, especially toward substrates containing active hydrogen. They have found wide use in the commercial manufacture of polyurethanes, which are used in making rigid and flexible foam, elastomers, coatings, and adhesives. Preparation. Most isocyanates are prepared by reacting excess phosgene (COCl2) with primary amines in high-boiling-point solvents, as in reaction (1). The reaction involves several steps: the R
−HCL
NH2 + COCl2 −−−−−→ R
∆
NHCOCl −−−−−→
(1) R
N
C
O + HCl
(1)
first, leading to carbamoylchloride (1), is exothermic, while the hydrogen chloride (HCl) abstraction to form the isocyanates requires heat. The excess of phosgene prevents or suppresses the formation of side products, such as ureas. The industrial manufacture of isocyanates uses continuous processes in which streams of amine and phosgene, optionally both in solvents, are mixed and the resultant reaction mixtures heated, while the by-product hydrogen chloride is vented off. Products are typically purified by (vacuum) distillation. Other methods of preparation, such as the reaction of salts of isocyanic acid with alkyl halides or the thermal or photolytic degradation of carbonylazides, are less common and are used only in special cases. The direct conversion of aromatic amines or nitro compounds with carbon monoxide (CO) in the presence of platinum metal catalysts [reaction (2)] R
NO2 + 3CO −→ R
N
C
O + 2CO2
(2)
has been explored as a viable alternative for the industrial manufacture of isocyanate in place of the
473
474
Isocyanate phosgenation technology, which is less desirable in terms of the effect on the environment. However, technical problems have so far prevented the commercialization of any of the various proposed routes. Characterization. The heterocumulene system N C O absorbs strongly in the infrared region at approximately 2250–2275 cm−1; this band is very characteristic and is widely used for the spectroscopic identification of isocyanates. Isocyanates react readily with alcohols to form urethanes (also known as carbamates), which are used for their characterization. See INFRARED SPECTROSCOPY. Reactions. Isocyanates are very reactive compounds. They have been used extensively as building blocks of nitrogen heterocycles, especially in the synthesis of fused-ring systems. Typically, the C N double bond of the isocyanate group reacts readily with active hydrogen-containing substrates, such as amines, alcohols, or water. Alcohols react with isocyanates to form urethanes [reaction (3)] while priR
N
C
O + R OH −→ R
NACOOR
(3)
mary amines give N,N-disubstituted ureas. In the latter reaction, water adds to give initially N-substituted carbamic acids, which undergo decarboxylation at room temperature to re-form amine and carbon dioxide [reaction(4)]. Most often, excess isocyanate con-
cyanate [reaction (7)]. The 1,4-dipolar intermediate R C
NR' + R"
N
C
O
R R R"
R
R
N
C
O
R
R'N
NR"
O
R
O
N R"
N
(7) O
NR" R' R NR"
R
R
N
NR"
R
R
O
N R"
R
can be intercepted by either a second molecule of isocyanate or an imine. Systems containing conjugated double bonds have been shown to react with (predominantly) aryl and sulfonyl isocyanates across both double bonds in a hetero-diene synthesis; an example is shown in reaction (8), the formation of an oxazolopyrimidine R N
(R
N
C
O + H2 O −→
R
O [R
NHCOOH] −→ R
NH2 + CO2
N
C
O −→ RNH
CONHR
(5)
See AMINE; UREA. Isocyanates readily undergo cycloaddition reactions with other compounds having activated double bonds, such as C C, C N, or C O. The simplest such adducts are 2 + 2 cycloadducts, in which four-membered ring heterocycles are formed from an enamine [reaction (6)], a compound in which amino
CH
C
O
2R
N
C
N
C
O
R'
R
R'
(6)
N R
N
N
O
O
N
O
N
C
O
R
N O
N R
R
O R
R'
R2N
O
O
R H
N
O
O + R"
(8)
O
from 2-vinyloxazole. Isocyanates are also known to react with each other to form low-molecular-weight oligomers or polymers under certain catalyzed conditions. Most commonly they form dimers, 1,3diazetidinediones or trimers with hexahydro-1,3,5triazinetrione structure, also known as isocyanurates [reaction (9)]. These cyclizations have also been
R' R2N
C
(4)
sumes the liberated amine to yield symmetric disubstituted ureas [reaction (5)]. RNH2 + R
N
N
n R
N
C
O
N N
R O
R
R
groups are directly linked to carbon-to-carbon double bonds. More common, however, are adducts formed from one double-bond component and two molecules of isocyanate, or vice versa. The reactions have been shown to proceed stepwise via dipolar intermediates. An example is the formation of a hexahydro1,3,5-triazine derivative from an imine and iso-
N
C O
n
(9)
shown to involve 1,4-dipolar intermediates; they are thermally reversible, especially in the case of the cyclodimers which dissociate back to starting isocyanates on heating to approximately 180–200◦C
Isoelectric point (350–390◦F). An extension of this reaction type is the catalyzed polymerization of alkyl and aryl isocyanates to polyureas or 1-nylons at low temperature. These homopolymers, which are stabilized by addition of base as chain terminators, are thermolabile and usually melt with decomposition at 180–200◦C (350–390◦F); they have not found industrial application. See POLYMERIZATION. The thermal generation of N,N-disubstituted carbodiimides from isocyanates in the absence of catalysts is likely to involve a labile asymmetric isocyanate dimer in which the C O group of the heterocumulene is involved in the cycloadduct formation [reaction (10)]. Other heterocumulenes, O 2R
N
C
O
⌬
R
N
O NR
R
N
C
N
R + CO2
(10)
such as carbodiimides (RN C NR), isothiocyanates (RN C S), and ketenes (R2C C O), are also known to undergo cycloadditions with isocyanates. Isocyanates react with organometallic and phosphorus compounds primarily by inserting into the metal-X bond (X = OR, H, halogen, NR2, and so forth). Several of these additions, especially those involving certain tin (Sn) compounds with SnOR or SnOSn bonds, have been studied in detail, since they exhibit catalytic activity in many isocyanate reactions such as trimerization and urethanization. See ORGANOMETALLIC COMPOUND. Isocyanate-based polymers. Polymers based on the reactions of isocyanates were first developed in 1937. Diisocyanates react with difunctional reagents, such as diols, to form addition polymers with a wide variety of properties. The flexibility in the choice of starting materials (diisocyanate, diol, diamine, diacid, and so forth) and consequently in the multitude of possible adducts make this product group unique in the field of polymeric materials. Two aromatic diisocyantes, tolylene diisocyanate (TDI; 2) and di(4-isocyanatophenyl)methane (MDI; 3) have become the major starting materials for a family of polymeric products, such as flexible CH3 N
N
C
C
O
O
(2)
O
C
N
N
(3)
C
O
and rigid polyurethane foams used in construction and appliance insulation, automotive seating, and furniture. Elastomers based on MDI, polyols, and polyamines are widely used in the automotive industry, where reaction injection molding technology is used for the manufacture of exterior parts such as body panels and bumpers. See POLYOL. Thermoplastic polyurethane elastomers (TPU) are used in the molding and extrusion of many industrial and consumer products with superior abrasion resistance and toughness. See POLYURETHANE RESINS. The trimerization to polyisocyanurates and the formation of polyamides from dicarboxylic acids have been utilized to synthesize polymers with excellent thermal properties. Aliphatic diisocyanates, notably 1,6-diisocyanatohexane (HDI), fully hydrogenated MDI (H12MDI), and isophorone diisocyanate (IPDI), have become building blocks for color-stable polyurethane coatings and elastomers with high abrasion resistance. See POLYAMIDE RESINS; POLYMER. Toxicology. Most low-molecular-weight isocyanates are strong lacrimators, and exposure to vapors should be avoided. Inhalation of vapors can cause pulmonary edema. Isocyanates are also known to be strong skin irritants that can cause allergic eczema and bronchial asthma, and they have been known to produce severe skin and lung hypersensitivity. Most isocyanates have a low level of oral toxicity. Amine precursors of several industrial diisocyanates are believed to be carcinogenic. Exposure limits have been set for isocyanate vapors in industrial environments. Reinhard H. Richter Bibliography. V. I. Gorbatenko and L. I. Samarai, Synthesis and reactions of haloalkyl isocyanates, Synthesis, 1980:85–110, 1980; G. Oertel, Polyurethane Handbook, 2d ed., 1993; S. Patai (ed.), The Chemistry of Cyanates and Their Thioderivatives, 1977; A. A. R. Sayigh, H. Ulrich, and W. J. Farrissey, Jr., Diisocyanates in J. K. Stille (ed.), Condensation Polymers, pp. 369–476, 1972; H. Ulrich, Synthesis of polymers from isocyanates in polar solvents, J. Polym. Sci.: Macromol. Rev., 11:93–133, 1976.
Isoelectric point The pH of the acidity or alkalinity of the dispersion medium of a colloidal suspension or an ampholyte at which the solute does not move in an electrophoretic field. The term isoelectric point is abbreviated pI. Ampholytes. These molecules carry acid as well as basic functional groups. When dissolved in a suitable medium, ampholytes may acquire positive and negative charges by dissociation or by accepting or losing protons, thereby becoming bipolar ions (zwitterions). Ampholytes may be as small as glycine and carry just one chargeable group each; or as large as polyampholytes (polyelectrolytes that carry positive charges, negative charges, or both). They may possess molecular weights in the hundreds of thousands like proteins or in the millions like nucleic acids, and
475
Isoelectric point
pKa + pKb = 6.1 2
The net charges of alanine vary in an easily recognizable way from the far acid to far alkaline pHs, because the 2 pK values of the two chargeable groups lie far apart. The same is true for the titration curve of
12
pI
10
pKb = 9.87
8 pH
isoelectric pH = 6.1
6 4 pKa = 2.34 2 0
0.5
1.5 _ 2.0 1.0 equivalents of OH
Fig. 1. Titration of alanine with sodium hydroxide (NaOH), showing the course of pH with added fractional equivalents. The four arrows show, from left to right, the pKa [1/2 cations: NH3+(CH3)COOH, and 1/2 zwitterions: NH3+(CH3)CHCOO−], the pl (all zwitterions), the pKb (1/2 zwitterions, 1/2 anions), and the end of titration, when all alanine molecules are in the anionic form: NH2(CH3)CHOO−.
net charge Q
+2
+ αNH3
+1
NH+3 pl = 9.74
Iysine
0 −
−1
COO 100% negative charge 3 4 5
7 pKbα = pKb = 8.95 10.53 pH ∋
pI =
100% positive charge (of amino groups)
≈
carry many hundreds of chargeable groups. See ION; NUCLEIC ACID; PH; PROTEIN. Determination. An example of establishing the isoelectric point is shown by the course of the pH changes during the titration of alanine [NH2CH(CH3)COOH], a 1:1 ampholyte, meaning a molecule that carries one positively and one negatively ionizable group. Starting from acid solution (Fig. 1), relatively small pH changes (with alkali as the titrant) are observed between pH 2 and 3 (acidic), and again between pH 9.5 and 10.5 (alkaline), caused by the buffering capacity of the carboxyl ( COOH) and amine ( NH2) groups as weak electrolytes. The pH for 1/2 equivalence corresponds to the pK of the acid function (a value related to the equilibrium constant), where one-half of the alanine molecules still carry only a positive charge (−NH3+), while the other half are also negatively charged (−COO−). Thus alanine exists in the form of zwitterions. See PK. In the case of alanine, the titration curve is repeated with close symmetry on its alkaline side. After adding 1.5 equivalents of sodium hydroxide (NaOH), the pH is that of the pK of the basic groups, which is also the range of maximum buffering capacity for alkaline pHs. At this point, one-half of the alanine molecules are still in the zwitterionic state, and onehalf are entirely negatively charged. Halfway between the acid and alkaline titration ranges, after adding 1 equivalent of NaOH and thus creating acid-base neutrality, all alanine molecules are in the zwitterionic, net-zero-charge state, and the pH ≡ pI. The value of pI in this example can be calculated as the average of the two [basic (b) and acidic (a)] pK values by the equation
∋
476
14
Fig. 2. Course of the charges carried by lysine (pKa = 2.18) molecules in dilute aqueous solution with pH during a titration with alkali. Shown are the net charge, going from +2 at low pH to +1 between pH 4.5 and 6.5, the isoelectric point at pH 9.74, the net charge approaching −1 at about pH 13, and the courses of the individual charges on the three chargeable groups. The break in the curve indicates that in this range the course of the net charges (Q) is not shown completely, but eventually reaches a value of +2.
the amino acid lysine, NH2CH[(CH2)4NH2]COOH, a 2:1ampholyte, meaning a molecule that carries three ionizable (chargeable) groups, two of which are the same and one of opposite sign (Fig. 2). The inflection points mark the pK’s. The pH at which the curve of overall charge crosses from positive to negative values marks the isoelectric point at pH = 9.74 = pI. Here, lysine is in the purely zwitterionic state. For molecules that carry four or more chargeable groups, that is, for polyelectrolytes, the courses of the overall titration curves may no longer reflect the individual dissociation steps clearly, as the dissociation areas usually overlap. The isoelectric point then becomes an isoelectric range, such as for pigskin (parent) gelatin, a protein that exhibits an electrically neutral isoelectric range from pH 7 to pH 9. Since ampholytes in an electric field migrate according to their pI with a specific velocity to the cathode or anode, the blood proteins, for example, can be separated by the techniques of gel or capillary-zone electrophoresis. See ELECTROPHORESIS; TITRATION. Isoelectric focusing. In the method of isoelectric focusing, a pH gradient within a chromatographic column parallels the potential gradient. The components (for example, proteins) of a mixture will each migrate in the electric field until they reach their respective isoelectric zone, at which point they will stop moving. Different components thus become neatly separated along the electrophoretic path. Influence of change in reactivities. Since the pI depends on the pKs of the individual groups of an ampholyte, anything that affects the reactivities of the groups also affects the pI. This is the case when a variety of cosolutes, especially electrolytes, are present. An example is gelatin, whose midpoint of the isoelectric range will move from 8 to 6 if the concentration of codissolved sodium chloride (NaCl) changes from 10−3 to 10−1 N. Similarly, the solubility of an ampholyte, which depends on the solvation
Isoetales of its charge-carrying groups, varies with pH, with codissolved electrolyte, or with other cosolutes. At the isoelectric point, many ampholytes show a maximum of solution turbidity, some form of precipitation, or a larger friction factor and a maximum or minimum of viscosity. See CONCENTRATION SCALES; ELECTROLYTE. Ion-exchange chromatography. The important separation technique of ion-exchange chromatography is based on the selective adsorption of ampholytes on the resins with which the column is filled, at a given pH. For example, the larger the net positive charge of an ampholyte, the more strongly will it be bound to a negative ion-exchange resin and the slower will it move through the column. By rinsing with solutions of gradually increasing pH, the ampholytes of a mixture can be eluted and made to emerge separately from the column and be collected. Automated amino acid analyzers are built on this principle. Significance. The notion that some ampholytes may pass with changing pH through a state of zero charge (zero zeta potential) on their way from the positively to the negatively charged state has become so useful for specifying and handling polyampholytes that it was extended to all kinds of colloids, and to solid surfaces that are chargeable in contact with aqueous solutions. Practically all metal oxides, hydroxides, or hydroxy-oxides become charged by the adsorption of hydrogen ions (H+) or hydroxide ions (OH−), while remaining neutral at a specific pH. Strictly speaking, the isoelectric point of electrophoretically moving entities is given by the pH at which the zeta potential at the shear plane of the moving particles becomes zero. The point of zero charge at the particle (solid or surface) is somewhat different but often is not distinguished from the isoelectric point. It is determined by solubility minima or, for solid surfaces, is found by the rate of slowest adsorption of colloids (for example, latexes) of well-defined charge. See AMINO ACIDS; COLLOID; ELECTROKINETIC PHENOMENA; ION EXCHANGE; IONSELECTIVE MEMBRANES AND ELECTRODES. F. R. Eirich Bibliography. P. C. Hiemenz, Principles of Colloid and Surface Chemistry, 3d ed., 1997; J. W. Jorgenson, New directions in electrophoretic methods, Amer. Chem. Soc. Symp., 335:182–198, 1987; N. Kallay et al., Determination of the isoelectric points of several metals by an adhesive method, J. Phys. Chem., 95:7028–7032, 1991; P. G. Rhigetti, Isoelectric Focusing, 1983; C. Tanford, Physical Chemistry of Macromolecules, 1961.
Isoelectronic sequence A term used in spectroscopy to designate the set of spectra produced by different chemical elements ionized in such a way that their atoms or ions contain the same number of electrons. An atom or ion with the same number of electrons will generally have spectra with similar features, as the number of electrons determines most atomic properties. The sequence in the table is an example. Since the
Example of isoelectronic sequence Designation of spectrum CaI ScII TiIII VIV CrV MnVI
Emitting atom or ion Ca Sc+ Ti2+ V3+ Cr4+ Mn5+
Atomic number, Z
Number of electrons
20 21 22 23 24 25
20 20 20 20 20 20
neutral atoms of these elements each contain Z electrons, removal of one electron from scandium, two from titanium, and so forth, yields a series of ions all of which have 20 electrons. Their spectra are therefore qualitatively similar, but the spectral terms (energy levels) increase approximately in proportion to the square of the core charge, just as they depend on Z2 in the one-electron sequence H, He+, Li2+, and so forth. As a result, the successive spectra shift progressively toward shorter wavelengths (higher energies), but maintain similar spectral features. Isoelectronic sequences are useful in predicting unknown spectra of ions belonging to a sequence in which other spectra are known. See ATOMIC STRUCTURE AND SPECTRA. F. A. Jenkins; W. W. Watson; Alfred S. Schlachter
Isoetales An order in the class Lycopsida that diverged from the Lepidodendrales in the Late Devonian. These two groups have several characters that are not found in other lycopsids, notably a centralized, shootlike rooting structure (rhizomorph) that allows finite growth, wood production, and tree-sized dimensions. Isoetaleans evolved from trees as an increasingly specialized and reduced lineage; all but the earliest are small-bodied shrubs and pseudoherbs. A reduced morphology characterizes the only living isoetalean genus, Isoetes. See LEPIDODENDRALES. Isoetes (see illus.) is a globally distributed genus of approximately 150 species. Many are poorly differentiated, as their simple morphology lacks taxonomically useful characters and hence indicates a high probability of evolutionary convergence. The slow-growing plants resemble a small grass or rush. The basal corm is a highly reduced, woody, bipolar rhizomorph, anchored in the substrate by rootlets and capped by a vestigial stem. It is surrounded by clasping quill-like leaves that function as sporophylls, developing large sporangia. As resources decline during the growing season, the sporangia that are formed on successive sporophylls switch from being “female” megasporangia to “male” microsporangia, and eventually to abortive sporangia. Megasporangia contain several hundred megaspores, whereas microsporangia contain numerous small microspores that germinate to release spermatozoids that are multiflagellate. Isoetaleans are delimited by the bilateral symmetry and ontogeny of their rhizomorphs, relative to the
477
478
Isolaimida rior transverse rows of punctations. The diagnostic characteristics of this superfamily are the presence of six hollow tubes around the oral opening and two whorls of six circumoral sensilla. Amphids are apparently absent, though some authors speculate that their function is taken over by the dorsolateral papillae of the second whorl. The triradiate stoma is elongate and has thickened walls anteriorly. The esophagus is clavate. The female ovaries are opposed and outstretched, and may have flexures. The gubernaculum, supporting the thickened male spicules, has two dorsal apophyses, and male preanal supplementary organs are papilloid. Paired caudal papillae are large in males, small in females. See NEMATA (NEMATODA). Armand R. Maggenti (a)
(b)
Morphology and anatomy of extant Isoetes. (a) Compact, cormose rhizomorph and unbranched stem of I. melanopoda. (b) More elongate, once-branched rhizomorph and stem of I. andicola. (After W. N. Stewart and G. W. Rothwell, Paleobotany and the Evolution of Plants, 2d ed., Cambridge University Press, 1993)
radial rhizomorph symmetry of the Lepidodendrales. The most likely origin of the Isoetales is from among the relatively primitive lepidodendraleans that possess bisporangiate cones resembling those of the Selaginellales. The group originated in the Late Devonian with poorly known small trees such as Lepidosigillaria, Cyclostigma, and later Protostigmaria-Lepidodendropsis. The Late Carboniferous Sporangiostrobus and Chaloneria were small trees and shrubs inhabiting coal swamps. The Early Triassic Pleuromeia and Early Cretaceous Nathorstiana (Pleuromeiales) resembled Chaloneria in growth habit but showed reductions of the rhizomorph. Plants resembling Isoetes occur in the Middle to Late Triassic periods, but the earliest definite Isoetes species are of Early Cretaceous age. See SELAGINELLALES. Isoetaleans inhabited Paleozoic coal swamps (Chaloneria), Mesozoic coastal marshes (Pleuromeia), and Cenozoic river and lake systems (Isoetes). They typically formed a major ecological component of low-diversity communities that lacked large trees. See LYCOPHYTA; LYCOPSIDA. Richard M. Bateman; William A. DiMichele Bibliography. K. B. Pigg, Evolution of isoetalean lycopsids, Ann. Mo. Bot. Gard., 79:589–612, 1992; J. E. Skog and C. R. Hill, The Mesozoic herbaceous lycopsids, Ann. Mo. Bot. Gard., 79:648–675, 1992; W. C. Taylor and R. J. Hickey, Habitat, evolution, and speciation in Isoetes, Ann. Mo. Bot. Gard., 79:613– 622, 1992.
Isolaimida An order of nematodes comprising the single superfamily Isolaimoidea. The order consists of one family and one genus. They are rather large for freeliving soil nematodes (0.1–0.2 in. or 3–6 mm) and are found in seldom-cultivated sandy soils. Some forms have anterior annulations, while others have poste-
Isomerization Rearrangement of the atoms within hydrocarbon molecules. Isomerization processes of practical significance in petroleum chemistry are (1) migration of alkyl groups, (2) shift of a single-carbon bond in naphthenes, and (3) double-bond shift in olefins. Migration of alkyl groups. An example of alkyl group migration (skeletal isomerization) is reaction (1). Isomerization to more highly branched conC C
C
C C C n -Hexane
C
C C C C C 2-Methylpentane C C
C
C
C
(1)
C 2,2-Dimethylbutane
figurations has commercial importance since it results in improvement in combustion quality in the automobile engine as measured by octane number and increased chemical reactivity because tertiary carbon atoms result. The unleaded, motor-method octane numbers of the hexane isomers shown in reaction (1) are 26.0, 73.5, and 93.4, respectively. Normal butane is converted to isobutane (which has a tertiary carbon atom) to attain chemical reactivity with olefins in alkylation reactions where n-butane is inert. Isomerization of paraffins is a reversible firstorder reaction limited by thermodynamic equilibrium which favors increased branching at lower temperatures. Undesirable cracking reactions leading to catalyst deactivation occur at higher temperatures. They are controlled by adding a cracking suppressor such as hydrogen. Conversion of normal butane to isobutane is the major commercial use of isomerization. Usually, it is carried out in either liquid- or vapor-phase over aluminum chloride catalyst promoted with hydrogen chloride. In the vapor-phase process (250– 300◦F or 120–150◦C), the aluminum chloride is often supported on bauxite. In the liquid-phase processes (180◦F or 82◦C), it is dissolved in molten antimony
Isopoda trichloride or used in the form of a liquid complex with hydrocarbon. A second type of catalyst for vapor-phase isomerization (300–850◦F or 150– 450◦C) is a noble metal, usually platinum, supported on a carrier. This may be alumina with halide added to provide an acidic surface. All the processes are selective (95–98% to isobutane). Approximately 60% of the n-butane feed is converted per pass to isobutane in the liquid-phase process. Isopentane, a high-octane component used in aviation gasoline, is made commercially by isomerization of n-pentane. Petroleum naphthas containing five- and six-carbon hydrocarbons also are isomerized commercially for improvement in motor-fuel octane numbers. Noble-metal catalyst is normally used with higher-molecular-weight feeds. Isomerization of paraffins above six carbon atoms is of less importance, since octane improvement is limited by predominance of monomethyl branching at equilibrium. Skeletal isomerization is an important secondary reaction in catalytic cracking and catalytic reforming. Aromatics and olefins undergo skeletal isomerization as do paraffins. Single-carbon bond shift. This process, in the case of naphthenes, is illustrated by reaction (2). Cyclo-
tane numbers of the gasolines. See AROMATIZATION; CRACKING; MOLECULAR ISOMERISM; PETROLEUM PROCESSING AND REFINING. George E. Liedholm
Isometric process A constant-volume thermodynamic process in which the system is confined by mechanically rigid boundaries. No direct mechanical work can be done on the surroundings by a system with rigid boundaries; therefore the heat transferred into or out of the system equals the change of internal energy stored in the system. This change in the internal energy, in turn, is a function of the specific heat and the temperature change in the system as in Eq. (1), where QV 2 QV = U2 − U1 = CV dT (1) 1
is the heat transferred at constant volume, U is the internal energy, CV is the heat capacity at constant volume, and T is the absolute temperature. If the process occurs reversibly (the system going through a continuous sequence of equilibrium states), Eq. (2) 2 QV = T dS (2) 1
(2) C Methylcyclopentane
Cyclohexane
hexane and methylcyclohexane have been produced commercially by liquid-phase isomerization of the five-carbon ring isomers over aluminum chloride– hydrocarbon-complex catalyst promoted by hydrogen chloride. Conversion per pass is high, selectivity excellent, and reaction conditions mild (190◦F or 88◦C). Cyclohexane is a raw material for making nylon, and it may be dehydrogenated to benzene. Methylcyclohexane has been used to make synthetic nitration-grade toluene. Shift of a double-bond. This process is usefully applied when a specific olefin is needed for chemical synthesis, as in reaction (3). Double-bond shift C—C—C = C C—C = C—C
(3)
occurs selectively over acidic catalysts at temperatures below 450◦F (230◦C). However, the proportion undergoing skeletal isomerization increases as temperature is increased until, at temperatures in the range of 600–950◦F (300–510◦C), equilibrium is approached at fairly high space velocities. Equilibrium favors movement of double bonds to the more stable internal positions (85.6% 2-butene at 400◦F or 200◦C), and octane improvement accompanies this shift; however, the increase of octane number normally is insufficient to justify the cost of processing thermally cracked gasolines solely for this purpose. This type of isomerization occurs as a secondary reaction in the catalytic cracking and catalytic polymerization processes, in part accounting for the high oc-
holds, where S is the entropy. There is an increase in both the temperature and the pressure of a constant volume of gas as heat is transferred into the system. For a comparison of the isometric process with other processes involving a gas see POLYTROPIC Philip E. Bloomfield PROCESS
Isopoda An order of invertebrate crustaceans related to crabs, shrimps, and lobsters, all members of the class Malacostraca. The closest relatives of Isopoda are the amphipods, mysids (opposum shrimps), cumaceans, and tanaids, all of which are placed in the superorder Peracarida. Isopods are generally small but very common, highly diversified, and occurring in marine, freshwater, and terrestrial habitats. Sow bugs, pill bugs, and woodlice, as well as their marine relatives (such as gribbles and sea slaters), belong to this group. See CRUSTACEA; MALACOSTRACA; PERACARIDA. Morphology. Isopods are characterized by the lack of a carapace (an outer case or covering made of chitin), having the first thoracic segment fused with the head, one pair of maxillipeds (one of the four sets of paired mouthparts used for feeding), and usually seven pairs of similar legs, although the first leg may be clawlike. Adults of the suborder Gnathiidea possess five pairs of legs. As in most crustaceans, the body has three parts: the head (cephalon), thorax (pereon), and abdomen (pleon) [see illustration]. The cephalon bears sessile eyes and two pairs of antennae, the first of which is chemosensory and the second tactile in function. Each of the seven segments of the pereon
479
480
Isopoda pereon (thorax, seven segments)
cephalon (head) antenna
pleon (abdomen, six segments)
fifth pereopod (leg)
brood pouch with eggs
pleopods
1 mm
pleotelson (telson)
Limnoria (Flabellifera), female. (After R. J. Menzies, The comparative biology of the wood-boring isopod crustacean Limnoria, Museum of Comparative Zoology, Harvard Coll. Bull., 112(5):363–388, 1954)
(pereonites) normally bears a pair of legs called pereopods. These are used for locomotion and grabbing prey. The abdomen typically consists of five segments or pleonites (which may be fused in some) plus a sixth segment called the pleotelson. Uropods are the paired appendages at the base of the pleotelson; these are used in orientation and swimming. Each of the pleonites has a ventral pair of foliaceous appendages (leaflike pleopods) which are used for swimming and respiration. Ecology. At present there are more than 10,400 described species of isopods. Of those, about 4200 species are terrestrial, and these include the familiar sow bugs, pill bugs, roly-polys, and woodlice. They may be found in any habitat on land, including forests, caves, deserts, city sidewalks, and greenhouses. Terrestrial isopods feed on algae, moss, bark, and decaying plant and animal matter. Like insects, they play a role in the breakdown and recycling of organic nutrients. Cellulose digestion results from the presence of bacteria in the digestive system. In the continental United States and Canada, there are about 120 known species of terrestrial isopods. The most common are widely distributed and have probably been introduced from Europe. A majority of the most abundant species in North America are concentrated in the northeast and northwest, possibly a result of the transport of their ancestors into harbors by human immigrants. Marine isopods, numbering more than 5600 species, are found at all latitudes and in all marine habitats, from the intertidal zone to the deep sea, including coral reefs, sponges, algae, and seagrass beds. They range in size from less than a millimeter ( 115
CO2H OH
CO2H OH
OH CO2H
OH
50
5-epi-iPF2α-VI
OH CO2H
OH 5-epi -8,12-iso-iPF2α-VI
iPF2α-VI
OH
8,12-iso -iPF2α-VI
0
100
Class V F2-iPs
353 > 151
50
percent
484
0
100
Class IV F2-iPs
353 > 127
50 0
100
OH
Class III F2-iPs OH
50
OH
353 > 193
CO2H CO2H OH 15-epi-iPF2α-III
OH
OH iPF2α-III
0 10
12
14
16 retention time
18
20
22
Fig. 4. LC/MS/MS of urinary isoprostanes. Selected peaks were identified by comparison with synthetic standards.
cigarette smoking–related illnesses, and alcoholinduced liver disease. Isoprostanes, which are increased in low density lipoprotein when it is oxidized in vitro, have been found to be increased in the urine of asymptomatic patients with hypercholesterolemia and are present in human atherosclerotic plaque. These observations, and their measurement in urine models of atherosclerosis, raise the likelihood that they may be useful in elucidating the role of LDL and cellular protein oxidation in atherogenesis. In perhaps the most striking evidence to date of the free-radical involvement in atherosclerosis, J. Rokach and G. A. FitzGerald have shown that suppression of elevated iP generation (iPF2α-VI) in vivo reduces the progress of atherosclerosis in apolipoprotein E–deficient mice. Significantly, they have also shown in this study that the level of cholesterol remains high and unchanged. It suggests that in cholesteryl linoleate, the entity present in LDL, the cholesterol moiety is the carrier of the linoleic acid which is responsible for the oxidative stress and the subsequent cell damage. Presumably the cholesterol part is left intact. Recently, the most abundant iP in urine, 8,12-iso-iPF2α-VI (Fig. 4), has been used as an index of oxidant stress and severity of Alzheimer’s disease. See CHOLESTEROL. Garret A. FitzGerald; Joshua Rokach Bibliography. M. Adiyaman et al., Total synthesis of a novel isoprostane iPF2α-I and its identification
in biological fluids, Tetrahed. Lett., 37:4849–4852, 1996; J. P. Fessel et al., Discovery of lipid peroxidation products formed in vivo with a substituted tetrahydrofuran ring (isofurans) that are favored by increased oxygen tension, Proc. Nat. Acad. Sci. USA, 99:16713–16718, 2002; S. Kim et al., The first total synthesis of iPF4α-VI and its deuterated analog, Tetrahed. Lett., 43:2801–2805, 2002; H. Li et al., Quantitative high performance liquid chromatography/tandem mass spectrometric analysis of the four classes of F2-isoprostanes in human urine, Proc. Nat. Acad. Sci. USA, 96:13381–13386, 1999; J. D. Morrow et al., A series of prostaglandin F2-like compounds are produced in vivo in humans by a non-cyclooxygenase, free radical-catalyzed mechanism, Proc. Nat. Acad. Sci. USA, 87:9383–9387, 1990; D. Pratico et al., Vitamin E suppresses isoprostance generation in vivo and reduces atherosclerosis in ApoE-deficient mice, Nature Med., 4:1189– 1192, 1998; L. G. Quan and J. K. Cha, Preparation of isoprostanes and neuroprostanes, J. Amer. Chem. Soc., 124:12424–12425, 2002; J. Rokach et al., Nomenclature of isoprostanes: A proposal, Prostaglandins, 54:853–873, 1997; D. F. Taber, J. D. Morrow, and L. J. Roberts II, A nomenclature system for the isoprostanes, Prostaglandins, 53:63–67, 1997; E. A. Weinstein et al., Prothrombinase acceleration by oxidatively damaged phospholipids, J. Biol. Chem., 275:22925–22930, 2000.
Isoptera
Isoptera An order of the Insecta, commonly called termites, with the general characteristics and stages of other exopterygote insects. They are closely related to the Blattodea. Indeed, the only wood-eating cockroach, Cryptocercus, has a similar collection of protozoa in its hindgut to that of the primitive termite Zootermopsis. Similarly, the Australian Mastotermes, though socially complex, is considered the most primitive extant termite by reason of its having an anal lobe in the adult wing; this character is found throughout the cockroaches. Approximately 2000 species of termites have been described and these are placed in six or seven families: Mastotermitidae, Hodotermitidae, Termopsidae (sometimes included in the Hodotermitidae), Kalotermitidae, Rhinotermitidae, Serritermitidae and the Termitidae. The latter family represents the higher termites and includes over 80% of all known termite species. See EXOPTERYGOTA; ORTHOPTERA. The termite group is typically a tropical one, but certain genera do occur outside the tropics and may be found as far north as British Columbia and Ontario. The group is distinguished by the fact that all species are eusocial and all feed on cellulose. The castes, apart from the imagos and primary reproductives, are drawn from the larval forms. In this respect they differ from the Hymenoptera where the castes are all variants of adults. Termites live in nests of varying degrees of complexity, ranging from large exigeous mounds to diffuse or temporary galleries in wood or soil. See HYMENOPTERA. Anatomical characteristics. The mature termite (alate or imago) has membranous wings which extend beyond the end of the abdomen. There is a pair of compound eyes, and a pair of ocelli is present in most groups. The wings are superimposed flat on the abdomen when the insect is not in flight. Flight is weak and fluttering and is usually short. When the alate alights, the wings are shed along a basal suture with the base of each wing (the wing scale) being retained. The alates vary in color from yellow, through brown, to coal black. Some species (usually lightly pigmented) fly during the night; others (usually heavily pigmented) fly during the day. The time of flight varies from species to species both with respect to the season of the year and the time of the day or night. Soldier caste. In almost all termite species a second type of individual is produced in the colony. This is the soldier, which lacks wings, is nonreproductive, and is variously modified for defense. There are four rather distinct types of soldiers: mandibulate, phragmotic, nasutoid, and nasute. In mandibulate soldiers the head and mandibles are greatly enlarged and heavily sclerotized (Fig. 1a). The mandibles may be biting, snapping, or pincherlike and more or less symmetrical or strongly asymmetrical. In phragmotic soldiers the mandibles are not as conspicuously enlarged as in mandibulate forms. The head is high and truncate in front and is used to plug openings in the workings.
2 mm
(a)
(b)
Fig. 1. Head capsule and antennae of (a) mandibulate soldier of Zootermopsis nevadensis showing enlarged mandibles, and (b) nasute soldier of Lacessititermes palawanensis.
In the families Rhinotermitidae and Termitidae there is a cephalic gland which opens via a small pore on the dorsal head surface. In some groups this gland and its opening have been variously modified for defense in the soldiers. In some rhinotermitid soldiers the opening of the gland lies at the anterior margin of the head, and the fluid is discharged into an open troughlike structure which extends forward from the head capsule. These have been termed nasutoid soldiers. Certain species have both mandibulate and nasutoid forms. Finally, in the termitid subfamily Nasutitermitinae the cephalic gland opens at the tip of an elongated tube or nasus which projects anteriorly, giving the head the appearance of a pear-shaped syringe (Fig. 1b). These are the nasute soldiers. Soldiers are preceded during development by an intermediate stage (the soldier-nymph or white soldier) which is soldierlike in form but is unsclerotized. In general, mandibulate soldiers are rather large and occur in relatively small numbers in the colonies, whereas nasute soldiers are relatively minute and constitute as much as 30% of the population. Worker caste. In the more advanced termites there is a third caste, the worker. True workers usually have some pigmentation as opposed to the immature termites, which are generally white. Workers lack wings, are nonreproductive, and have mandibles which resemble those of the imagos; they are usually blind. In many lower termites there is no distinct worker caste, and the work of the colony is performed by young individuals which have undergone at least two stages of development. These are still capable of becoming either alates or soldiers, but may continue to molt without becoming differentiated. Eventually they may undergo stationary molts with no appreciable change in size or structure. These “stabilized” individuals, which function as workers but are still capable of differentiation into other castes, have been termed pseudergates or pseudoworkers.
485
486
Isoptera Replacement reproductives. In addition to the definitive castes (alate, soldier, and worker) another type of individual may occur in the colony under certain circumstances. These individuals are the supplementary or replacement reproductives. Although the original pair (king and queen) may live for two or three decades, the life of the colony itself is not limited by their survival. If one or both are lost, other individuals in the colony become reproductive. If these new reproductives have wing buds, they are termed nymphoid (or brachypterous or second-form) reproductives. If they lack wing buds, they are termed ergatoid (or apterous or third-form) reproductives. Physiology. Two main areas of physiology which are of particular interest in Isoptera are nutrition and the development of the different castes. Nutrition. Cellulose is the prime source of nutrition in termites, and may be obtained in nature as wood, humus in soil, or as grass. This consumption of cellulose, while most beneficial in natural areas, especially in the tropics, has caused some of the wood and grass eating forms to be considered pests. The cellulose is masticated and passed through the gut to the hind intestinal caecum where it is digested by anaerobic protozoa (in the lower termites) and by bacteria (in the higher termites). Most researchers now believe that the cellulose is degraded to glucose and then to acetate which is able to pass through the chitinous lining of the hindgut where the large glucose molecule could not. The older idea that the glucose in the hindgut passes forward through a valve to the midgut for absorption by active transport is not now supported. Nitrogen is obtained and conserved by both cannibalism and by adventitious consumption of fungi. Recently it has also been demonstrated that some of the symbiotic bacteria are able to fix atmospheric nitrogen. Uric acid is conserved and plays an important role as a nitrogen source. Caste development. In many families of termites when the functional reproductives are removed from a colony some individuals that remain can transform into egg-laying and sperm-producing forms while retaining larval character; these are the replacement (or neotenic) reproductives. In 1932 it was proposed that the functional reproductives produce a chemical substance (pheromone) that inhibits the development of the gonads of the larvae. The hypothesis was elaborated over the years to include the involvement of twelve pheromones in the caste regulation of Kalotermes and Zootermopsis. This complex hypothesis, however, was quite speculative, as none of the postulated pheromones has yet been identified nor their origin discovered. Work by other investigators has discounted many of the earlier experiments. It is known, however, that vitellogenesis is affected by the presence of reproductives and that juvenile hormone plays an important part in the regulation; the mandibular glands may be involved. There probably is an inhibitory pheromone that requires contact or ingestion. See NEOTENY; PHEROMONE. Soldier development is more complex. All that is known about it is that injection of a high titer of
juvenile hormone into the larvae of some species will cause soldiers to form. Biology. Eggs laid by a female reproductive hatch into mobile six-legged larvae which are broadly similar in form to the adult. They undergo successive molts over time and the later stages have external wing buds. The wing-budded forms finally molt into winged imagos with dark sclerotized bodies and well developed eyes. These imagos (alates) are produced at certain times of the year, and under certain weather conditions leave the nest all at once in a dispersal flight. The alates fly poorly and eventually settle on the ground or vegetation. Most species then lose their wings by actively twisting their bodies and breaking each wing along a basal suture line. The imagos with short wing stubs, now known as dealates, form into tandems, that is, a male following a female, and move to crevices in soil or wood. There they begin to excavate a small cavity, the nuptial cell or copularium, where mating takes place. The male and female are now referred to as the king and queen, and unlike the hymenopterans the male persists in the colony. Eggs are laid, hatch, and the new larval individuals are the nucleus of a new colony. Not all larvae in a colony, however, become alate imagos. In most termites larvae may transform into soldiers and workers, while in many species larval forms may become sexually mature without becoming true adults. The main castes found in the Isoptera are these larval forms and, in this order particularly, they can be thought of as end points in individual development: once a termite transforms into a certain caste it loses ability to molt again. The soldier is designed for active defense while the worker is for excavation, feeding, and caring for other castes and individuals. As described above, some genera lack soldiers, some have more than one type of soldier, and some lower termites lack true workers. While new colonies may be formed by imagos after a flight, in many genera new colonies can also be formed by a process known as sociotomy. This occurs when part of a colony is cut off from the main group by chance, or by the colony having grown so big as to stress the communication system that holds it together, or, as happens in some genera, by the active emigration of part of a colony. All termites consume cellulose, but in some instances the source is not dead wood. Many termites, such as Cornitermes, ingest humus; the harvesting termites (for example, Hodotermes in Africa and Drepanotermes in Australia) collect and consume grass, while the spectacular Macrotermes of Africa construct fungus gardens. Behavior. As in all social insects, communication is essential to their behavior. Colony cohesion and recognition, defense, construction of galleries and nests, pairing, and mating all are mediated through communication. Visual communication occurs only in adults, where the eyes are well developed; in other castes, where the eyes are rudimentary, communication is by chemical and tactile means. See ANIMAL COMMUNICATION; CHEMICAL ECOLOGY; SOCIAL INSECTS.
Isoptera sternal gland
(a) sternal gland
(b) Fig. 2. Posture of a Zootermopsis nymph (a) during normal activity and (b) while laying a trail. (After K. Krishna and F. Weesner, eds., Biology of Termites, vol. 1, 1969)
Alarm and defense. Many termites are known to communicate alarm by a combination of characteristic tactile movements coupled with the laying of a chemical trail. The trail has its origin in the secretion of a pheromone by the sternal gland or glands of the abdomen (Figs. 2 and 3). A greater or lesser number of termites, depending on the intensity of the initial excitation, are recruited to the spot where the original disturbance occurred by following the trail after having been alerted by physical contact with an excited individual or individuals. This type of communication has also been found in other social insects and has been termed alarm recruitment. Once a termite has been recruited to a specific locus by this mechanism, the action it then takes is dictated by factors at the site of disturbance. If the cause of alarm is a break in the nest or gallery, then building (in the form of fecal material deposition and the placement of debris) will occur by workers, while soldiers guard the site. If an intruder was the cause, then snapping will be exhibited by workers and mandibulate soldiers; soldiers of the Nasutitermitidae will eject a sticky terpene-containing substance from a gland situated in their heads. When the intruder is immobilized or killed, less excitation is generated and the termites’ response is to bury it by depositing fecal material and debris as in a gallery break. The response of a colony to alarm is further tuned by the phenomenon of polyethism, that is, the differential response of castes, and of individuals of differing ages within a caste, to the same stimulus. This phenomenon is seen most spectacularly in many species of Nasutitermes where in a localized disturbance there is an initial recruitment of soldiers as described above, but at the same time the workers retreat. In this case polyethism occurs partly in response to the frontal gland secretion of the soldiers, which has been shown to act as a short-lived recruitment pheromone for other soldiers in addition to its immobilizing function on intruders, and partly in the response to the sternal gland pheromone. The
487
latter pheromone, when combined with the communication of other information such as the presence of food, is responsible for exciting the workers. See PHEROMONE. Colony and kin recognition. Communication of kinship and membership of a colony is known to involve an odor characteristic of the colony. This odor is a composite one with contributions from pheromones, the recognition odors of various castes, the environment, and food. Intruders, including conspecifics from other colonies, are recognized and attacked when their odor is detected and does not match the generalized acquired colony odor. Colony odor is important in identifying aliens, in intraspecific competition, and in enhancing the degree of relationship between colony members by favoring inbreeding. The maintenance of a high degree of inbreeding would in turn favor the evolution of sociality by kin selection in these social insects with normal diploid sex determination. Pairing and mating. Following the dispersal flight, pairing of male and female imagos takes place without copulation; this occurs after the nuptial cell has been formed. Typically a female de-alate will take up a calling position with the tip of her abdomen pointing vertically, exposing the sternal gland. (In imagos this gland has been shown to produce an excitatory pheromone for males.) A male bumping into a calling female becomes very excited and will tend to follow another moving termite. Upon being bumped, the female lowers her abdomen and begins running and the male follows her in tandem. If the male loses antennal contact with the female she will stop and resume the calling position while he moves randomly; once more contacting the female the process is repeated. Excavation occurs when the pair reaches wood or a crevice. In certain African termites pairing has a distinct visual component: before losing their wings, the females alight on stems of grass and flutter their wings in a typical display which attracts the males. Mating takes place in the nuptial cell and occurs at intervals throughout the life of the reproductives. Copulation occurs in a rear-to-rear position.
gut
Malpighian tubule 5 mm
fat body abdominal sternite lll
ganglion
muscle abdominal sternite V
sternal gland abdominal sternite lV
external reservoir
X opening of external reservoir
Fig. 3. Section through the abdomen of a nymph of Zootermopsis nevadensis, showing the sternal gland. (After K. Krishna and F. Weesner, eds., Biology of Termites, vol. 1, 1969)
488
Isopycnic Construction behavior. Building behavior has many manifestations, from the construction of simple galleries to the construction of the large bisymmetrical “magnetic” mounds made by Amitermes meridionialis in northern Australia. These mounds take up a north-south longitudinal axis which may occur through differential building activity in relation to a heat stimulus, as has been shown to occur with another termite species in the laboratory. The shape and orientation have evolved as a temperature regulating system in a termite that is unable to descend into the soil to avoid the heat (as other members of its genus do) because it is found in a habitat that is seasonally inundated. For all termites, the basic act of building occurs as a response to a specific external stimulus. In some instances this primary building results in construction which, through its particular shape or chemical composition or both, acts as a new stimulus for further construction; this phenomenon is called “stigmergy.” The building will stop when the results of building eventually reduce the original stimulus that initiated the building. This behavioral feedback applies in this situation and in alarm recruitment. Evolution of sociality. Much has been proposed to explain the evolution of social behavior in the insects, and sociobiologists have heavily endorsed the idea that the haplodiploid form of sex determination found in the Hymenoptera is the basis of insect sociality. The termites, however, are diploid in both sexes and so destroy the generality of the hypothesis. Cytological studies, nonetheless, have inferred an exceptionally close genetic relationship between members of a termite colony, and this could partially explain the evolution of sociality and the concomitant altruistic behavior of nonsexual forms. Probably several factors were involved in the evolution of sociality in termites, including the need to nurture the cellulose-degrading fauna of the hindgut. The similarity of the gut fauna of the wood-eating cockroach Cryptocercus to that of Zootermopsis supports the importance of diet specificity in the evolution of the social termites from a cockroach ancestor. See INSECTA; SOCIOBIOLOGY. Alastair M. Stuart Bibliography. K. Krishna and F. M. Weesner (eds.), Biology of Termites, vol. 1, 1969, vol. 2, 1970; A. M. Stuart, The role of pheromones in the initiation of foraging, recruitment and defence by soldiers of a tropical termite, Nasutitermes corniger (Motschulsky), Chem. Senses, 6:409–420, 1981.
Isopycnic The line of intersection of an atmospheric isopycnic surface with some other surface, for instance, a surface of constant elevation or pressure. An isopycnic surface is a surface in which the density of the air is constant. Since specific volume is the reciprocal of density, isosteric surfaces coincide with isopycnic surfaces. On a surface of constant pressure, isopycnics coincide with isotherms, because on such a
surface, density is a function solely of temperature. On a constant-pressure surface, isopycnics lie close together when the field is strongly baroclinic and are absent when the field is barotropic. See BAROCLINIC FIELD; BAROTROPIC FIELD; SOLENOID (METEOROLOGY). Frederick Sanders; Howard B. Bluestein Bibliography. J. R. Holton, An Introduction to Dynamic meteorology, 4th ed., 2004; G. Visconti, Fundamentals of Physics and Chemistry of the Atmosphere, 2001.
Isostasy The application of Archimedes’ principle to the layered structure of the Earth. The elevated topography of Earth is roughly equivalent to an iceberg that floats in the surrounding, denser water. Just as an iceberg extends beneath the exposed ice, the concept of isostasy proposes that topography is supported, or compensated, by a deep root. The buoyant outer shell of the Earth, the crust, displaces the denser, viscous mantle in proportion to the surface elevation. Isostasy implies the existence of a level surface of constant pressure within the mantle, the depth of compensation. Above this surface the mass of any vertical column is equal. Equal pressure at depth can also be achieved by varying density structure or by the regional deflection of the lithosphere. See ARCHIMEDES’ PRINCIPLE; EARTH CRUST. This theory was independently proposed to explain inconsistencies in geodetic observations made by P. Bouguer (1735) in Peru and G. Everest (1836) in India. Each set of observations depended critically on the determination of vertical from a suspended plumb bob. In the absence of mass anomalies or topography, the bob will orient itself in the local direction of gravity, perpendicular to the surface of the Earth. Early geodesists recognized that mountain ranges such as the Andes and the Himalayas should deflect the plumb bob toward the mountains. Subsequent, independent work by Pratt (1855) demonstrated that the deflection was less than predicted by calculations, indicating that the gravitational attraction of the topography was somehow diminished. One explanation that was advanced (Airy isostasy; illus. a) for this discrepancy advocated a large iceberglike root of constant density, the mirror image of the surface topography. An alternative view (Pratt isostasy; illus. b) suggested that the lower interface was horizontal and that the density of the mountains was less than the density of the surrounding level regions. These theories, advanced in 1855, served to reduce errors in geodetic surveys, but they were unconfirmed until the direct measurement of crustal thickness beneath mountain ranges using seismic refraction techniques began in the late 1920s. These studies of crustal thickness have confirmed that mountains are being underlain by thick roots. See EARTH, GRAVITY FIELD OF; GEODESY. Local isostasy achieves equilibrium directly beneath a load by varying either the density or thickness of that mass column. This model attributes no
Isothermal chart
crust mantle (a)
depth of compensation lower- density blocks
crust mantle (b)
depth of compensation seamount (load)
crust mantle (c)
depth of compensation
Three major modes of isostatic compensation. (a) Airy isostasy, where the crustal density is constant beneath both the elevated topography and the level region; a large root extends beneath the elevated topography, and the depth of compensation is at the base of the crust where the pressure is constant. (b) Pratt isostasy, where the density of the crust varies inversely with the height of the topography; the depth of compensation is at the base of the horizontal crust–mantle boundary. (c) Flexural or regional isostasy, where the crust has some strength and is deflected beneath the elevated topography; the depth of compensation is a horizontal surface beneath the lowest extent of the crust.
inherent strength to the crust and assumes that the mantle is a simple fluid, redistributing mass to minimize pressure differences at depth. From studies of seamounts, oceanic trenches, foreland basins, and glacial rebound, it has become known that the outer shell of the Earth is rigid, responding to loads over a region broader than the load itself, and that the mantle is a viscous fluid with a time-dependent response to loads. The simplest method of examining the response of the Earth is to study an area influenced by a discrete load such as a seamount or a continental glacier. If local isostasy (Pratt and Airy) is applicable, the region surrounding the load will be horizontal, unaffected by the load. In contrast, if the lithosphere has finite strength and regional or flexural isostasy (illus. c) is applicable, the surrounding regions will be deflected down toward the load. Gravity, bathymetry, and seismic studies of the crust surrounding Hawaii and other seamounts have demonstrated that the crust is downwarped beneath seamounts. The implication of this regional response is that the oceanic lithosphere has some strength and that the Earth’s outer shell behaves elastically. Just as seamounts can be considered loads on oceanic crust, so can continental ice sheets be
viewed as loads on continental crust. Since the glacial retreat, Fennoscandia and Canada, which were directly beneath continental ice sheets, have experienced rapid uplift, as recorded by the series of beaches now resting tens to hundreds of meters above sea level. Uplift in this region directly beneath the previously loaded region confirms the prediction that the Earth’s surface is in hydrostatic equilibrium, but does not discriminate between the local and regional isostasy. Beyond the furthest extent of the ice sheet, raised beaches are also found, indicating that the crustal deformation associated with the ice loading was regional rather than local. This regional deflection provides compelling evidence for lithospheric strength and regional isostasy. See EARTH; LITHOSPHERE. Robin E. Bell; Bernard J. Coakley Bibliography. C. M. R. Fowler, The Solid Earth: An Introduction to Global Geophysics, 2d ed., 2005; R. Sabadini, K. Lambeck, and E. Boschi (eds.), Glacial Isostasy, Sea-Level and Mantle Rheology, 1991; B. J. Skinner and S. C. Porter, The Dynamic Earth: An Introduction to Physical Geology, 5th ed., 2003; A. B. Watts, Isostasy & Flexure of the Lithosphere, 2001; A. B. Watts et al., A multichannel seismic study of lithospheric flexure across the Hawaii-Emperor seamount chain, Nature, 313(6015):105–111, 1985.
Isotach A line along which the speed of the wind is constant. Isotachs are customarily represented on surfaces of constant elevation or atmospheric pressure, or in vertical cross sections. The closeness of spacing of the isotachs is indicative of the intensity of the wind shear on such surfaces. In the region of a jet stream the isotachs are approximately parallel to the streamlines of wind direction and are closely spaced on either side of the core of maximum speed. See JET STREAM; WIND. Frederick Sanders; Howard B. Bluestein
Isothermal chart A map showing the distribution of air temperature (or sometimes sea surface or soil temperature) over a portion of the Earth’s surface, or at some level in the atmosphere. On it, isotherms are lines connecting places of equal temperature. The temperatures thus displayed may all refer to the same instant, may be averages for a day, month, season, or year, or may be the hottest or coldest temperatures reported during some interval. Maps of mean monthly or mean annual temperature for continents, hemispheres, or the world sometimes show values reduced to sea level to eliminate the effect of elevation in decreasing average temperature by about 3.3◦F/1000 ft (5.9◦C/1000 m; see illus.). Such adjusted or sea-level maps represent the effects of latitude, continents, and oceans in modifying temperature; but they conceal the effect of mountains and highlands on temperature
489
490
Isothermal process 80° Arctic Circle 60°
10°F (−12°C)
50°F (10°C)
40°
10°F (−12°C)
30°F (−1°C)
30°F (−1°C) 50°F (10°C)
Tropic of Cancer 20°
0°
70°F (21°C)
70°F (21°C)
160°
140°
80°F (27°C)
80°F (27°C)
120°
100°
40°
80°F (27°C)
20°
0°
60°
80°
140°
160°
180°
80°F (27°C)
70°F (27°C)
40°
20°
80°F (27°C)
Tropic of Capricorn
50°F (10°C)
60°
30°F (−1°C)
Antarctic Circle
80° Chart showing selected January average isotherms. (After D. K. Fellows, Our Environment: An Introduction to Physical Geography, 2d ed., John Wiley and Sons, 1980)
distributions. The first isothermal chart, prepared by Alexander von Humboldt in 1817 for low and middle latitudes of the Northern Hemisphere, was the first use of isopleth methods to show the geographic distribution of a quantity other than elevation. These maps are now varied in type and use. Isothermal charts are drawn daily in major weather forecasting centers; 5-day, 2-week, and monthly charts are used regularly in long-range forecasting; mean monthly and mean annual charts are compiled and published by most national weather services, and are presented in standard books on, for example, climate, geography, and agriculture. See AIR TEMPERATURE; TEMPERATURE INVERSION; WEATHER FORECASTING AND PREDICTION. Arnold Court
these cases, U2 = U1. For real gases and compressible liquids, the integral is nonzero, and the internal change is computed using a volumetric equation of state, that is, a relation between pressure, temperature, and volume. The change in entropy, S, in an isothermal process is given by Eq. (2). For condensed phases with V2 P2 ∂P ∂V S 2 − S1 = dV = − dP (2) ∂T V ∂T P V1 P1 a small coefficient of thermal expansion, S2 ≈ S1, while for real fluids the equation of state is used in Eq. (2) to obtain the numerical value of the entropy change. Equation (3) is used for the isothermal entropy change of an ideal gas. S2 − S1 = nR ln
Isothermal process A thermodynamic process that occurs with a heat addition or removal rate just adequate to maintain constant temperature. The change in internal energy per mole, U, accompanying a change in volume in an isothermal process is given by Eq. (1), where T V2 ∂P U 2 − U1 = − P dV (1) T ∂T V V1 is the temperature, P is pressure, and V is the volume per mole. The integral in Eq. (1) is zero for an ideal gas (which has the equation of state PV = nRT, where R is the gas constant and n the amount, so that the integrand vanishes), and for an incompressible condensed phase (solid or a liquid) for which the volume does not change with pressure. Thus, in both
V2 P2 = nR ln V1 P1
(3)
In the case of a reversible isothermal process, the work done, W, by the fluid and the transferred heat to the fluid, Q, are given by Eqs. (4) and (5). The V2 W =− PdV = −Q + (U2 − U1 ) (4) V1
Q = T (S2 − S1 )
(5)
sign convention used here is that heat supplied to the fluid and work done on the fluid are positive. For a reversible isothermal process in an ideal gas, these equations reduce to Eq. (6). The comparable expresW = −Q = −nRT ln
V2 P2 = +nRT ln V1 P1
(6)
Isotope sions for real fluids, which depend on the equation of state, can be found in thermodynamics textbooks. Phase transitions (that is, first-order phase transitions such as solid to liquid, or liquid to vapor) in a pure material occur at constant temperature and constant pressure according to the Gibbs phase rule. The Gibbs free energy change for a first-order phase transition, G, is identically zero at the transition temperature, the enthalpy and entropy changes, H and S respectively, are related as given in Eq. (7). G = 0 = H − T S or
S =
H U + PV = T T
(7)
See CHEMICAL THERMODYNAMICS; GAS; PHASE RULE; PHASE TRANSITIONS; THERMODYNAMIC PROCESSES. Stanley I. Sandler Bibliography. S. I. Sandler, Chemical and Engineering Thermodynamics, 3d ed., 1999; R. E. Sonntag, C. Borgnakke, and G. J. Van Wylen, Fundamentals of Thermodynamics, 5th ed., 1997.
Isotone One of two or more atoms which display a constant difference A – Z between their mass number A and their atomic number Z. Thus, despite differences in the total number of nuclear constituents, the numbers of neutrons in the nuclei of isotones are the same. The numbers of naturally occurring isotones provide useful evidence concerning the stability of particular neutron configurations. For example, the relatively large number (six and seven, respectively) of naturally occurring 50- and 82-neutron isotones suggests that these nuclear configurations are especially stable. On the other hand, from the fact that most atoms with odd numbers of neutrons are anisotonic, one may conclude that odd-neutron configurations are relatively unstable. See NUCLEAR STRUCTURE. Henry E. Duckworth
Isotope One member of a (chemical-element) family of atomic species which has two or more nuclides with the same number of protons (Z) but a different number of neutrons (N). Because the atomic mass is determined by the sum of the number of protons and neutrons contained in the nucleus, isotopes differ in mass. Since they contain the same number of protons (and hence electrons), isotopes have the same chemical properties. However, the nuclear and atomic properties of isotopes can be different. The electronic energy levels of an atom depend upon the nuclear mass. Thus, corresponding atomic levels of isotopes are slightly shifted relative to each other. A nucleus can have a magnetic moment which can interact with the magnetic field generated by the electrons and lead to a splitting of the electronic levels. The number of resulting states of nearly the same energy depends upon the spin of the nucleus
and the characteristics of the specific electronic level. See ATOMIC STRUCTURE AND SPECTRA; HYPERFINE STRUCTURE; ISOTOPE SHIFT. Of the 108 elements reported thus far, 81 have at least one stable isotope whereas the others exist only in the form of radioactive nuclides. Some radioactive nuclides (for example, 115In, 232Th, 235U, 238 U) have survived from the time of formation of the elements. Several thousand radioactive nuclides produced through natural or artificial means have been identified. See RADIOISOTOPE. Of the 83 elements which occur naturally in significant quantities on Earth, 20 are found as a single isotope (mononuclidic), and the others as admixtures containing from 2 to 10 isotopes. Isotopic composition is mainly determined by mass spectroscopy. See MASS SPECTROSCOPE. Nuclides with identical mass number (that is, A = N + Z) but differing in the number of protons in the nucleus are called isobars. Nuclides having different mass number but the same number of neutrons are called isotones. See ISOBAR (NUCLEAR PHYSICS); ISOTONE. Nuclear stability. The stability of a nuclide is governed by its total energy E as given by the Einstein relation E = Mc2, where M is the nuclidic mass and c is the velocity of light. If E is less than the combined energies of possible decay products, the nuclide will be stable. A major factor in determining stability is the relative strength of the nuclear force which acts to attract nucleons and the coulombic force (repulsive) which arises from the electric charge on the protons. Nuclides with an even number of protons or neutrons are prevalent in the table. Of the 287 nuclides tabulated, 168 are even-even (that is, an even number of both neutrons and protons), 110 are odd, and only 9 are odd-odd. This demonstrates the increased attraction of the nuclear force between pairs of nucleons of the same type (the pairing effect). Nuclides for which the number of either protons or neutrons (or both) comprises so-called magic numbers (for example, 8, 20, 50, 82, etc.) have increased stability. See NUCLEAR STRUCTURE. Isotopic abundance. The term isotopic abundance refers to the isotopic composition of an element found in its natural terrestrial state. The isotopic composition for most elements does not vary much from sample to sample. This is true even for samples of extraterrestrial origin such as meteorites and lunar materials brought back to Earth by United States crewed and Soviet uncrewed missions. However, there are a few exceptional cases for which variations of up to several percent have been observed. There are several phenomena that can account for such variations, the most likely being some type of nuclear process which changes the abundance of one isotope relative to the others. For some of the lighter elements, the processes of distillation or chemical exchange between different chemical compounds could be responsible for isotopic differences. See NUCLEAR REACTION; RADIOACTIVITY. The lead isotopes 206Pb, 207Pb, and 208Pb are stable and are end products of naturally decaying 238U,
491
492
Isotope Natural isotopic compositions of the elements Atomic no.
Element symbol
1
H*
2
He*
3
Li*
4 5
Be B*
6
C*
7
N*
8
O
9 10
F* Ne*†
11 12
Na Mg
13 14
Al Si
15 16
P S*
17
Cl
18
Ar†
19
K
20
Ca
21 22
Sc Ti
23
V
24
Cr
25 26
Mn Fe
27 28
Co Ni
29
Cu
30
Zn
Mass no.
Isotopic abundance, %
Atomic no.
Element symbol
Mass no.
Isotopic abundance, %
1 2 3 4 6 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 36 35 37 36 38 40 39 40 41 40 42 43 44 46 48 45 46 47 48 49 50 50 51 50 52 53 54 55 54 56 57 58 59 58 60 61 62 64 63 65 64 66 67 68 70
99.985 0.015 0.000138 99.999862 7.5 92.5 100 19.9 80.1 98.90 1.10 99.634 0.366 99.762 0.038 0.200 100 90.51 0.27 9.22 100 78.99 10.00 11.01 100 92.23 4.67 3.10 100 95.02 0.75 4.21 0.02 75.77 24.23 0.337 0.063 99.600 93.2581 0.0117 6.7302 96.941 0.647 0.135 2.086 0.004 0.187 100 8.0 7.3 73.8 5.5 5.4 0.250 99.750 4.35 83.79 9.50 2.36 100 5.8 91.72 2.2 0.28 100 68.27 26.10 1.13 3.59 0.91 69.17 30.83 48.6 27.9 4.1 18.8 0.6
31
Ga
32
Ge
33 34
As Se
35
Br
36
Kr†
37
Rb
38
Sr†
39 40
Y Zr
41 42
Nb Mo
44
Ru
45 46
Rh Pd
47
Ag
48
Cd
49
In
50
Sn
69 71 70 72 73 74 76 75 74 76 77 78 80 82 79 81 78 80 82 83 84 86 85 87 84 86 87 88 89 90 91 92 94 96 93 92 94 95 96 97 98 100 96 98 99 100 101 102 104 103 102 104 105 106 108 110 107 109 106 108 110 111 112 113 114 116 113 115 112 114 115 116 117 118 119 120 122 124
60.1 39.9 20.5 27.4 7.8 36.5 7.8 100 0.9 9.0 7.6 23.5 49.6 9.4 50.69 49.31 0.35 2.25 11.6 11.5 57.0 17.3 72.165 27.835 0.56 9.86 7.00 82.58 100 51.45 11.27 17.17 17.33 2.78 100 14.84 9.25 15.92 16.68 9.55 24.13 9.63 5.52 1.88 12.7 12.6 17.0 31.6 18.7 100 1.02 11.14 22.33 27.33 24.46 11.72 51.839 48.161 1.25 0.89 12.49 12.80 24.13 12.22 28.73 7.49 4.3 95.7 1.0 0.7 0.4 14.7 7.7 24.3 8.6 32.4 4.6 5.6
Isotope Natural isotopic compositions of the elements (cont.) Atomic no.
Element symbol
Mass no.
Isotopic abundance, %
Atomic no.
Element symbol
Mass no.
51
Sb
Ta Yb
Te
57.3 42.7 39.9 0.096 2.60 0.908 4.816 7.14 18.95 31.69 33.80 100 0.10 0.09 1.91 26.4 4.1 21.2 26.9 10.4 8.9 100 0.106 0.101 2.417 6.592 7.854 11.23 71.70 0.09 99.91 0.19 0.25 88.48 11.08 100 27.13 12.18 23.80 8.30 17.19 5.76 5.64 3.1 15.0 11.3 13.8 7.4 26.7 22.7 47.8 52.2 0.20 2.18 14.80 20.47 15.65 24.84 21.86 100 0.06 0.10 2.34 18.9 25.5 24.9 28.2 100 0.14 1.61 33.6 22.95 26.8 14.9
69 70
52
121 123 71 120 122 123 124 125 126 128 130 127 124 126 128 129 130 131 132 134 136 133 130 132 134 135 136 137 138 138 139 136 138 140 142 141 142 143 144 145 146 148 150 144 147 148 149 150 152 154 151 153 152 154 155 156 157 158 160 159 156 158 160 161 162 163 164 165 162 164 166 167 168 170
71
Lu
72
Hf
73
Ta
74
W
75
Re
76
Os†
77
Ir
78
Pt
79 80
Au Hg
81
Tl
82
Pb†
83 90 92
Bi Th U*
169 168 170 171 172 173 174 176 175 176 174 176 177 178 179 180 180 181 180 182 183 184 186 185 187 184 186 187 188 189 190 192 191 193 190 192 194 195 196 198 197 196 198 199 200 201 202 204 203 205 204 206 207 208 209 232 234 235 236
53 54
55 56
l Xe†
Cs Ba
57
La
58
Ce
59 60
62
Pr Nd
Sm
63
Eu
64
Gd
65 66
Tb Dy
67 68
Ho Er
Er
Isotopic abundance, % 100 0.13 3.05 14.3 21.9 16.12 31.8 12.7 97.40 2.60 0.16 5.2 18.6 27.1 13.74 35.2 0.012 99.988 0.13 26.3 14.3 30.67 28.6 37.40 62.60 0.02 1.58 1.6 13.3 16.1 26.4 41.0 37.3 62.7 0.01 0.79 32.9 33.8 25.3 7.2 100 0.15 10.1 17.0 23.1 13.2 29.65 6.8 29.524 70.467 1.4 24.1 22.1 52.4 100 100 0.0055 0.7200 99.2745
*Isotopic composition may vary with sample depending upon geological or biological origin. composition may vary with sample because some of the isotopes may be formed as a result of radioactive decay or nuclear reactions.
†Isotopic
493
494
Isotope dilution techniques 235
U, and 232Th, respectively, whereas 204Pb is not produced by any long-lived decay chain. Thus, the isotopic composition of lead samples will depend upon their prior contact with thorium and uranium. The potassium isotope 40K has a half-life of 1.28 × 109 years and decays by beta-ray emission to 40Ar and by electron capture to 40Ca, which are both stable. This can cause the argon in potassium-bearing minerals to differ in isotopic abundance from that found in air. It is possible to determine the age of rocks by measuring the ratio of their 40K/40Ar content. This ratio technique can also be used for rock samples which bear other long-lived, naturally occurring isotopes such as 87Rb (rubidium), thorium, and uranium. See LEAD ISOTOPES (GEOCHEMISTRY); ROCK AGE DETERMINATION. An interesting example of anomalous isotopic compositions has been observed in the Oklo uranium deposit in Gabon (western Africa). Based upon extensive research, it has been concluded that this is the site of a natural chain reaction that took place about 1.8 × 109 years ago. Much of the uranium in this formation has been depleted of the fissionable isotope 235U. The isotopic composition of some of the other elements found at or near this deposit has also been altered as a result of fission, neutron absorption, and radioactive decay. Use of separated isotopes. The areas in which separated (or enriched) isotopes are utilized have become fairly extensive, and a partial list includes nuclear research, nuclear power generation, nuclear weapons, nuclear medicine, and agricultural research. Various methods are employed to prepare separated isotopes. Mass spectroscopy is used in the United States and Russia to prepare inventories of separated stable isotopes. Distillative, exchange, and electrolysis processes have been used to produce heavy water (enriched in deuterium, 2H), which is used as a neutron moderator in some reactors. The uranium enrichment of 235U, which is used as a fuel in nuclear reactors, has mainly been accomplished by using the process of gaseous diffusion in uranium hexafluoride gas in very large plants. This method has the disadvantage of requiring large power consumption. Techniques which can overcome this problem and are finding increasing favor include centrifugal separation and laser isotope separation. See ISOTOPE SEPARATION; NUCLEAR REACTOR. For many applications there is a need for separated radioactive isotopes. These are usually obtained through chemical separations of the desired element following production by means of a suitable nuclear reaction. Separated radioactive isotopes are used for a number of diagnostic studies in nuclear medicine, including the technique of positron tomography. See NUCLEAR MEDICINE. Studies of metabolism, drug utilization, and other reactions in living organisms can be done with stable isotopes such as 13C, 15N, 18O, and 2H. Molecular compounds are “spiked” with these isotopes, and the metabolized products are analyzed by using a mass spectrometer to measure the altered isotopic ratios. The use of separated isotopes as tracers for
determining the content of a particular element in a sample by dilution techniques had broad applicability. See ISOTOPE DILUTION TECHNIQUES; RADIOISOTOPE (BIOLOGY). Atomic mass. Atomic masses are given in terms of an internationally accepted standard which at present defines an atomic mass unit (amu) as exactly equal to one-twelfth the mass of a neutral atom of 12 C in its electronic and nuclear ground states. On this scale the mass of a 12C atom is equal to 12.0 amu. Atomic masses can be determined quite accurately by means of mass spectroscopy, and also by the use of nuclear reaction data with the aid of the Einstein mass-energy relation. Atomic weight is the average mass per atom per amu of the natural isotopic composition of an element. Practically all atomic weights are now calculated by taking the sum of the products of the fractional isotopic abundances times their respective masses in amu. The atomic weight of a neutral atom of a nuclide is nearly equal to an integer value A, because the mass of both the neutron and proton is almost identically 1 amu. See ATOMIC MASS. Daniel J. Horen Bibliography. A. P. Dickin, Radiogenic Isotope Geology, 1994, reprint 1997; H. E. Duckworth, Mass Spectroscopy, 2d ed., 1990; K. Lajtha and R. Michener, Stable Isotopes in Ecology, 1994; D. S. Schimel, Theory and Application of Tracers, 1993; H. P. Taylor, J. R. O’Neill, and I. R. Kaplan, Stable Isotope Geochemistry, 1991; R. R. Wolfe, Radioactive and Stable Isotope Tracers in Biomedicine, 1992.
Isotope dilution techniques Analytical techniques that involve the addition to a sample of an isotopically labeled compound. Soon after the discovery of the stable heavy isotopes of hydrogen, carbon, nitrogen, and oxygen, their value in analytical chemistry was recognized. Stable isotopes were particularly useful in the analysis of complex mixtures of organic compounds where the isolation of the desired compound with satisfactory purity was difficult and effected only with low or uncertain yields. The addition of a known concentration of an isotopically labeled compound to a sample immediately produces isotope dilution if the particular compound is present in the sample. After thorough mixing of the isotopically labeled compound with the sample, any technique that determines the extent of the isotopic dilution suffices to establish the original concentration of the compound in the mixture. Isotope dilution techniques exploit the difficulty in the separation of isotopes, with the isotopically labeled “spike” following the analytically targeted compound through a variety of separation procedures prior to isotopic analysis. See ISOTOPE. The technique depends on the availability of a stable or radioisotope diluent with isotope abundance ratios differing markedly from those of the naturally occurring elements. With monoisotopic elements, such as sodium or cesium, radioactive elements of sufficiently long life can be used in isotope dilution
Isotope separation techniques. In 1954 M. G. Ingraham identified 68 elements that were analyzable by isotope dilution techniques. The list has since been expanded considerably. Biochemical applications. The original applications of isotope dilution were by biochemists interested in complex mixtures of organic compounds. In these studies care had to be taken to ensure the stability of the labeled compound and its resistance to isotopic exchange reactions. Nitrogen-15–labeled glycine for example, could be used to determine glycine in a mixture of amino acids obtained from a protein. Deuterium-labeled glycine could not be used reliably if the deuterium isotopes were attached to the glycine amino or carboxyl group, because in these locations deuterium is known to undergo exchange reactions with hydrogens in the solvent or in other amino acids. Deuterium is very useful in elemental isotopic analysis where total hydrogen or exchangeable hydrogen concentrations are desired. See BIOCHEMISTRY; DEUTERIUM. High-sensitivity applications. Applications of isotope dilution techniques have also been found in geology, nuclear science, and materials science. These applications generally focus on the very high sensitivity attainable with these techniques. Isotopes of argon, uranium, lead, thorium, strontium, and rubidium have been used in geologic age determinations of minerals and meteorites. Taking the estimated error as a measurement of sensitivity, isotopic dilution analyses of uranium have been done down to 4 parts in 1012 and on thorium to 8 parts in 109. Studies in geology and nuclear science require the determination of trace amounts of radiogenic products. If the half-life and decay scheme of the parent nuclide is known, then isotopic dilution determinations of parent and daughter isotopes provide a basis for the calculation of the age of the sample. If the age or history of the sample is known, then determination of the trace concentrations of isotopes provides information on pathways of nuclear reactions. See RADIOACTIVITY; ROCK AGE DETERMINATION. An example of the latter type of investigation is the determination of yields of cesium (Cs) from bombardment of gold or uranium with very high-energy protons. The use of radioactive 131Cs with techniques of high-sensitivity mass spectrometry permitted the determination of stable and long-lived Cs isotopes in concentrations of less than 10−15 g. The fundamental limitation in the sensitivity of isotope dilution as a trace analytical technique is in contamination of reagents used in the analytical procedure. In the case of the study on cesium cited above, reagent water prepared by triple distillation in quartz columns was found to have approximately 10−15 g of natural 133Cs per cubic centimeter. Studies using isotope dilution techniques have dealt with trace analysis of toxic materials in biological systems. Uranium-233 has been used as an isotopic spike in the determination of uranium in bovine liver and in oyster tissue. Assay of 10−10 g of uranium with a precision of better than 0.5% was achieved. Contamination of reagents remains
the limiting factor in sensitivity, with water assayed at the 4.5–22-femtogram level by isotope dilution. Uranium is an example of an element that is a potent biotoxin as well as a radiological hazard. Its assay in trace concentrations is of considerable importance in toxicology. See RADIATION BIOLOGY; TOXICOLOGY. Lewis Friedman Bibliography. J. H. Chen and G. J. Wasserburg, Isotopic determination of uranium in picomole and subpicomole quantities, Anal. Chem., 53:2060–2067, 1981; M. G. Ingraham, Stable isotope dilution as an analytical tool, Annu. Rev. Nucl. Sci., 4:81–92, 1954; J. A. Jonckheere, A. P. DeLeenheer, and H. L. Steyaert, Statistical evaluation of calibration curve nonlinearity in isotope dilution gas chromatography/mass spectrometry, Anal. Chem., 55:153–155, 1983; J. F. Pickup and K. McPherson, Theoretical considerations in stable isotope dilution mass spectrometry for organic analysis, Anal. Chem., 48:1885– 1890, 1976; D. C. Reamer and C. Veillon, Determination of selenium in biological materials by stable isotope dilution gas chromatography-mass spectrometry, Anal. Chem., 53:2166–2169, 1981.
Isotope separation The physical separation of different isotopes of an element from one another. The different isotopes of an element as it occurs in nature may have similar chemical properties but completely different nuclear reaction properties. Therefore, nuclear physics and nuclear energy applications often require that the different isotopes be separated. However, similar physical and chemical properties make isotope separation by conventional techniques unusually difficult. Fortunately, the slight mass difference of isotopes of the same element makes separation possible by using especially developed processes, some of which involve chemical industry distillation concepts. See ISOTOPE. Isotope separation depends on the element involved and its industrial application. Uranium isotope separation has by far the greatest industrial importance, because uranium is used as a fuel for nuclear power reactors, particularly the predominant light-water reactor type. The two main isotopes found in nature are 235U and 238U, which are present in weight percentages (w/o) of 0.711 and 99.283, respectively. Trace amounts of 234U are also present in natural uranium. In order to be useful as a fuel for light-water reactors, the weight percentage of 235U must be increased to between 2 and 5. The process of increasing the 235U content is known as uranium enrichment, and the process of enriching is referred to as performing separative work. See NUCLEAR FUELS; NUCLEAR REACTOR; URANIUM. The production of heavy water is another example of isotope separation. Because of its exceptional neutron slowing-down characteristics, heavy water is used as a moderator in some types of natural and very low enriched uranium-fueled nuclear reactors. Heavy water is obtained by isotope separation of
495
496
Isotope separation light hydrogen (1H) and heavy hydrogen (2H) in natural water. Heavy hydrogen is usually referred to as deuterium (D). All natural waters contain 1H and 2H, in concentrations of 99.985 and 0.015 w/o, respectively, in the form of H2O and D2O (deuterium oxide). Isotope separation increases the concentration of the D2O, and thus the purity of the heavy water. See DEUTERIUM; HEAVY WATER. The development of laser isotope separation technology provided a range of potential applications. For example, enrichment of natural gadolinium in the isotope 157Gd could result in the production of very high quality material for reactivity control of light-water reactors. Other isotope separation applications range from space-flight power sources (238Pu) to medical magnetic resonance imaging (13C) and medical research (15O). Separation technologies. The isotope separation process that is best suited to a particular application depends on the state of technology development as well as on the mass of the subject element and the quantities of material involved. Processes such as electromagnetic separation which are suited to research quantities of material are generally not suited to industrial separation quantities. However, the industrial processes that are used, gaseous diffusion and gas centrifugation, are not suited to separating small quantities of material. Uranium isotope separation by either the gaseous diffusion or gas centrifuge process requires that the uranium be in the form of uranium hexafluoride (UF6) gas. Both separation processes take advantage of the slight difference in the mass of the 235U and 238 U atoms. For gaseous diffusion, the separator stage may consist of a single gas diffuser, gas compres-
tails stream diffuser
motor
product stream tails stream motor
diffuser
compressor product stream tails stream
porous barrier
coolant
control valve compressor product stream tails stream diffuser
tails stream Fig. 1. Gaseous diffusion stage. (After U.S. Atomic Energy Commission Report ORO-684, January 1972)
sor, gas cooler, compressor drive motor, and associated piping and valving, while a gas-centrifuge stage may consist of many centrifuge machines and associated hardware operating in parallel. In either case, the unit or units operating in parallel on material of the same mole fraction form a stage. Each stage has feed, enriched product, and depleted waste streams and a characteristic stage-separation efficiency factor. Stages connected in series to obtain progressive separation are referred to as cascades. In the course of accomplishing the separation of 235 U and 238U in each of the separation stages (either diffusion or centrifuge), large amounts of uranium hexafluoride gas are recirculated. Application of chemical engineering theory using abundance ratios, reflux ratios, and flow-rate considerations, analogous to distillation theory, yields a term for the total flow of uranium in the separation cascade consisting of the product of two terms. One term is a function of the stage-separation factor, which is dependent on the molecular weights of the isotopes being separated, and the other is a function of the flow rates and isotopic composition of the feed, product, and waste. This second term is called separative work (SW), and is usually referred to in the following units: either kgU SWU or SWU in the United States, and either kgSW or tonnes SW in other countries. Gaseous diffusion. In the gaseous diffusion process, uranium hexafluoride gas is circulated under pressure through specially designed equipment containing porous barriers. The barriers contain hundreds of millions of pores per square inch, with the average diameter of the pores being only a few millionths of an inch. The kinetic energies of the uranium hexafluoride molecules depend only on the temperature, which is the same within each stage. Since the kinetic energy of a molecule is proportional to the product of its mass and the square of its speed, the lighter molecules (235U) have higher speeds causing them to strike the barrier more frequently than the heavy molecules (238U), and therefore they have a higher probability of penetration. Thus, the lighter molecules (235U) diffuse (or more correctly, effuse) through the barriers faster than the heavy molecules (238U), with the result that the downstream side of the barrier is enriched in the lighter gas and the upstream side is depleted in the lighter gas. See DIFFUSION; KINETIC THEORY OF MATTER. The stage-separation factor for the gaseous diffusion process depends only on the molecular weights of 235U19F6 and 238U19F6, which are 349 and 352, respectively; it is equal to 1.0043. Because the maximum possible value of the separation factor is so close to unity, a gaseous diffusion plant typically needs a very large number of diffuser-compressorcooler units (Fig. 1) connected in series in order to obtain sufficiently enriched product. For example, approximately 1400 stages are needed to produce the 4 w/o 235U that is typical of that used in lightwater reactors. The throughput in each stage of a large diffusion plant can require large compressors driven by motors of several thousand horsepower (several megawatts).
Isotope separation Gas centrifugation. The gas centrifuge process has the advantage that its isotope separation capability for uranium is proportional to the absolute mass difference of the uranium hexafluoride molecules, rather than the relative difference, which is the case for gaseous diffusion. The separation effect is achieved by pressure diffusion through the gravitational field generated within the centrifuge rotor. The concentration of the heavy molecule tends to increase toward the rotor wall, while that of the lighter molecule tends to increase at the center of the rotor. The radial separation factor of a gas centrifuge machine depends on the angular velocity of rotation, the radius of the rotor wall, and the absolute gas temperature. For a rotor radius of 6 cm (2.4 in.), a rotational speed of 40,000 revolutions per minute (corresponding to a peripheral speed of 250 m/s or 800 ft/s), and a gas temperature of 300 K (80◦F), the separation factor is 1.0387; the difference of this separation factor from unity is nine times that of the separation factor for the gaseous diffusion process. The separation capability is approximately inversely proportional to the absolute temperature, and operating at a low or at least moderate temperature is therefore desirable. However, an adequate temperature margin must always be maintained to ensure that the uranium hexafluoride process gas does not condense. In a practical centrifuge, an internal countercurrent circulation flow is introduced (Fig. 2). This flow has the effect of generating an axial concentration difference which is a factor larger than the elementary radial separation factor discussed above. Thus, even a centrifuge of very modest speed has a separation factor greatly exceeding that of the gas diffusion process. The countercurrent flow also permits the enriched and depleted fractions to be removed at the end of the rotor. In practice, centrifuge throughput has a strong dependence on the rotor peripheral speed and length. However, rotor material composition limits peripheral speed since the rotor will break if induced stresses exceed the tensile strength of the material. See CENTRIFUGATION. Laser separation. Three experimental laser isotope separation technologies for uranium are the atomic vapor laser isotope separation (AVLIS) process, the uranium hexafluoride molecular laser isotope separation (MLIS) process, and the separation of isotopes by laser excitation (SILEX) process, which was announced in Australia in the early 1990s. The AVLIS process, which is more experimentally advanced than the MLIS and SILEX processes, exploits the fact that the different electron energies of 235U and 238 U absorb different colors of light (that is, different wavelengths). In the AVLIS process, lasers are tuned to emit a combination of colors that will be absorbed only by a 235U atom, which then emits an electron. This photoionization leaves the 235U atom with a net positive charge, allowing the atom, now an ion, to be selectively separated by using electromagnetic fields. AVLIS technology is inherently more efficient than either the gaseous diffusion or gas cen-
feed to vacuum block valve
497
light fraction heavy fraction
magnetic bearing
top scoop molecular pump rotating bowl vacuum vacuum case UF6
rotating baffle
bottom scoop armature motor stator needle bearing Fig. 2. Gas centrifuge machine. (After M. Benedict, T. H. Pigford, and H. W. Levi, Nuclear Chemical Engineering, 2d ed., McGraw-Hill, 1981)
trifuge processes. It can enrich natural uranium to 4 w/o 235U in a single step. In the United States, the AVLIS process is being developed to eventually replace the gaseous diffusion process for commercially enriching uranium. See PHOTOIONIZATION. The AVLIS process includes two major component systems: a laser system and a separator system (Fig. 3). The laser system uses two types of lasers: dye lasers that generate the light used for photoionization of the uranium, and copper-vapor lasers that energize the dye lasers. Powerful green-yellow light from electrically driven copper-vapor lasers is converted to red-orange light in the dye laser. This redorange light is tuned to the precise colors that are absorbed by 235U but not by 238U. This laser–pumpedlaser scheme is required because dye lasers cannot be powered directly by electricity. See LASER. In the separator system, uranium metal is melted and vaporized by means of an electron beam that dye master oscillator dye amplifier
tails collector product collector
laser
vapor pump laser
pump laser vaporizer
Fig. 3. Atomic vapor laser isotope separation (AVLIS) process. (After Lawrence Livermore National Laboratory Report LLL-TB-072, 1985)
498
Isotope shift
feed gas
nozzle
light fraction
heavy fraction
knife edge
curved wall Fig. 4. Cross section of slit-shaped separation nozzle system with streamlines of the gas flow. (After M. Benedict, T. H. Pigford, and H. W. Levi, Nuclear Chemical Engineering, 2d ed., McGraw-Hill, 1981)
creates an atomic vapor stream of 235U and 238U. The tuned dye laser beams are passed through the vapor stream, where they photoionize the 235U atoms. An electromagnetic field deflects the selected photoions to the product collector, where they condense. The 238U atoms, which are unaffected by the color-selective laser beams, pass through the product collector to condense on the tails collector. The enriched uranium liquid-metal condensate flows out of the separator to be cast and stored in solid metallic form, or to be converted to uranium hexafluoride or oxide as required. Nozzle process. In the German jet nozzle or Becker process, which is a laboratory process, uranium hexafluoride gas is diluted in a helium or hydrogen carrier gas, and fed by compressor into the nozzle on one side of a 1–2-m long (3–6-ft) semicircular slot in which there is a deflector blade on the other side to separate enriched and depleted streams (Fig. 4). The semicircular wall of the slot provides a partial centrifugal effect such that the heavier isotope tends to stay closer to the wall compared to the lighter isotope. The radius of the semicircular slot is of the order of 0.1 mm and is formed by stacking photoetched foils. Helikon process. The South African Helikon process is aerodynamically similar to the Becker process from which it was developed. However, it is believed to use a vortex tube instead of a slot and opposing gas flows instead of a deflector blade for flow separation. South Africa’s small power-intensive commercial Helikon uranium enrichment plant was shut down in the mid-1990s because of its high cost of operation. Thermal diffusion. The separation of isotopes by thermal diffusion is based on the fact that when a temperature gradient is established in a mixture of uniform composition, one component will concentrate near the hot region and the other near the cold region. Thermal diffusion is carried out in the annular space between two vertical concentric pipes, the inner one heated and the outer one cooled. Thermal diffusion was used in the United States during World War II for uranium, but it is not efficient for industrial separation. It has been used to separate small quantities of isotopes for research purposes.
Chemical exchange. The chemical exchange process has proved to be the most efficient for separating isotopes of the lighter elements. This process is based on the fact that if equilibrium is established between, for example, a gas and a liquid phase, the composition of the isotopes will be different in the two phases. Thus, if hydrogen gas is brought into equilibrium with water, the ratio of deuterium to hydrogen is several times greater in the water than in the hydrogen gas. By repeating the process in a suitable cascade, it is possible to effect a substantial separation of the isotopes with a relatively small number of stages. The chief use of chemical exchange is in the large-scale production of heavy water. Chemical exchange has also been used for the large-scale separation of other isotopes. For example, the isotopes of boron have been separated by fractional distillation of the boron trifluoride–dimethyl ether complex. See CHEMICAL EQUILIBRIUM. Distillation. The separation of isotopes by distillation is much less efficient than separation by other methods. Distillation was used during World War II to produce heavy water, but the cost was high and the plants are no longer in existence. See DISTILLATION. Electrolysis. Electrolysis of water is the oldest large-scale method of producing heavy water. Under favorable conditions, the ratio of hydrogen to deuterium in the gas leaving a cell in which water is electrolyzed is eight times the ratio of these isotopes in the liquid. Because it is electricity-intensive, the process is very expensive unless low-cost hydroelectric generating capacity is available. Electrolysis is used in the United States only as a finishing step to concentrate heavy water to final-product specifications. See ELECTROLYSIS. Electromagnetic process. Electromagnetic separation was the method which was first used to prove existence of isotopes. The mass spectrometer and mass spectrograph are still widely used as research tools. In the electromagnetic process, vapors of the material to be analyzed are ionized, are accelerated in an electric field, and enter a magnetic field which causes the ions to be bent in a circular path. Since the light ions have less momentum than the heavy ions, they are bent through a circle of smaller radius, and the two isotopes can be separated by placing collectors at the proper location. The large-capacity electromagnetic uranium isotope separators developed in the United States during World War II were called calutrons. See CHARGED PARTICLE OPTICS; MASS SPECTROSCOPE. Julian J. Steyn Bibliography. M. Benedict, T. H. Pigford, and H. W. Levi, Nuclear Chemical Engineering, 2d ed., 1981; Edison Electric Institute, EEI Enrichment Handbook, Rep. NFC-90-001, 1990.
Isotope shift A small difference between the different isotopes of an element in the transition energies corresponding to a given spectral-line transition. For an electronic spectral-line transition between two energy levels a
Isotope shift and b in an atom or ion with atomic number Z, the small difference Eab = Eab(A) − Eab(A) in the transition energy between isotopes with mass numbers A and A is the isotope shift. It consists largely of the sum of two contributions, the mass shift (MS) and the field shift (FS), also called the volume shift. The mass shift is customarily divided into a normal mass shift (NMS) and a specific mass shift (SMS); each is proportional to the factor (A − A)/AA. The normal mass shift is a reduced mass correction that is easily calculated for all transitions. The specific mass shift is produced by the correlated motion of different pairs of atomic electrons and is thus absent in one-electron systems. It is generally difficult to calculate precisely the specific mass shift, which may be 30 times larger than the normal mass shift for some transitions. The field shift is produced by the change in the finite size and shape of the distribution of nuclear charge (protons) when the number of neutrons N = A − Z varies with Z fixed. Since electrons whose orbits penetrate the nucleus are influenced most, S-P and P-S transitions have the largest field shift. Measurements of isotope shifts, coupled with accurate theoretical calculations of mass shifts and relativistic and other corrections, can be used to determine the change δ(rc2) in the mean-square nuclear charge radius rc2 as N varies with Z fixed. < 37, the mass shift For very light elements, Z ∼ dominates the field shift. For Z = 1, the 0.13 nanometer shift (0.02%) in the red Balmer line led to the discovery of deuterium, the A = 2 isotope of hy 57, drogen. For medium-heavy elements, 38 ∼ the mass shift and field shift contributions to the isotope shift are comparable. For heavier elements, > 58, the field shift dominates the mass shift. A Z∼ representative case is shown in the illustration. See ATOMIC STRUCTURE AND SPECTRA; DEUTERIUM; RYDBERG CONSTANT. When isotope shift data have been obtained for at least two pairs of isotopes of a given element, a graphical method introduced by W. H. King in 1963 can be used to evaluate quantitatively the separate contributions of the mass shift and the field shift. Experimentally determined field shifts can be used to test theoretical models of nuclear structure, shape, and multipole moments. Experimentally determined specific mass shifts can be used to test detailed theories of atomic structure and relativistic effects. See NUCLEAR MOMENTS; NUCLEAR STRUCTURE. Some atomic transitions exhibit hyperfine struc202 199
200 201 204
198
546.076
546.074 wavelength, nm
546.072
Some isotope shifts in the green line of mercury, Z = 80. In this heavy element, the contribution of the field shift is much larger than that of the mass shift.
ture, which generally is dominated by the interaction of electron orbital and nuclear magnetic dipole moments, the latter existing for isotopes with nonzero nuclear spin. Accurate measurements of magnetic hyperfine structures exhibiting the so-called hyperfine anomaly (Bohr-Weisskopf effect) gives information about how the distribution of nuclear magnetization changes as N varies with Z fixed. See HYPERFINE STRUCTURE. Experimental techniques that have greatly increased both the amount and the precision of isotope shift data include energetic nuclear reactions for online production of isotopes with half-lives as short as milliseconds, and subsequent high-resolution laserspectroscopic studies of ions or atoms containing them. Laser spectroscopy of collimated, fast-moving beams of ions or atoms containing the isotopes uses techniques that exploit (rather than are hampered by) the Doppler shift caused by the motion of the ions or atoms through the laser light. Techniques developed for cooling and trapping of neutral atoms or highly charged ions with electromagnetic fields have been extended to short-lived isotopes and allow as few as one trapped atom or ion to be detected. Isotope shifts measured at low Z have tended to focus on precise tests of atomic few-body dynamics and determination of (rc2) for short-lived, exotic “halo nuclei” such as 6He and 11Li. Isotope shifts measured with laser or x-ray methods at high Z have tended to focus on determination of δ(rc2) and tests of atomic structure calculations where manybody, relativistic, quantum-electrodynamical (QED), and other interactions are significant and difficult to treat. These must be well understood for atomic parity-non-conservation (PNC) experiments, which provide stringent and useful tests of electroweak and other theories when different isotopes of highZ atoms are used. See ATOMIC THEORY; DOPPLER EFFECT; ELECTROWEAK INTERACTION; LASER; LASER COOLING; LASER SPECTROSCOPY; PARITY (QUANTUM MECHANICS); PARTICLE TRAP; QUANTUM ELECTRODYNAMICS; RELATIVISTIC QUANTUM THEORY. Isotope shift data have also been obtained for x-ray transitions of electrons in inner atomic shells and of muons in muonic atoms. See LEPTON. Peter M. Koch Bibliography. J. Bauche and R. J. Campeau, Recent advances in the theory of atomic isotope shift, Advan. At. Mol. Phys., 12:39–86, 1976; J. C. Berengut, V. V. Flambaum, and M. G. Kozlov, Calculation of relativistic and isotope shifts in Mg I, Phys. Rev. A, 72:044501, 2005; S. R. Elliott, P. Beiersdorfer, and M. H. Chen, Trapped-ion technique for measuring the nuclear charge radii of highly charged radioactive isotopes, Phys. Rev. Lett., 76:1031–1034, 1995; E. Gomez, L. A. Orozco, and G. D. Sprouse, Spectroscopy with trapped francium: Advances and perspectives for weak interaction studies, Rep. Prog. Phys., 69:79–118, 2006; R. S´anchez et al., Nuclear charge radii of 9,11Li: The influence of halo electrons, Phys. Rev. Lett., 96:033002, 2006; L.-B. Wang et al., Laser spectroscopic determination of the 6He nuclear charge radius, Phys. Rev. Lett., 93:142501, 2004.
499
500
Isotopic irradiation TABLE 2. Food irradiation effects
Isotopic irradiation The uses of the radiation emitted by radioactive isotopes (radioisotopes), principally in industry and medicine. Although this article deals primarily with gamma irradiation, many other forms of radiation from many different sources are often used for irradiation purposes. The use of radioactive isotopes to irradiate various materials can be divided into a number of major areas, including industrial, medical, and research (Table 1). For most applications the radioactive isotope is placed inside a capsule, but for some medical applications it is dispersed in the material to be irradiated. See RADIOISOTOPE. The radiation from radioisotopes produces essentially the same effect as the radiation from electron linear accelerators and other high-voltage particle accelerators, and the choice of which to use is based primarily on convenience and cost. Although the radioisotope radiation source does not require the extensive and complex circuitry necessary for a high-voltage radiation source, its radiation is always present and requires elaborate shielding for health protection and specialized mechanisms for bringing the irradiated objects into and out of the radiation beam. Further, the radiation output decreases with time according to the half-life of the radioisotope, which must therefore be replaced periodically. See PARTICLE ACCELERATOR. Industrial applications. The two main radioisotopes for industrial processing are cobalt-60 with a half-life of 5.271 years and an average gamma-ray energy of 1.25 MeV, and cesium-137 with a half-life of 30.07 years and a gamma-ray energy of 0.662 MeV. See CESIUM; COBALT. The application of industrial irradiation is increasing, with sterilization of medical disposables using cobalt-60 gamma rays being the most common. This includes the irradiation of plastic syringes, surgical gloves, and nonwoven drapes and gowns. In addition to the medical supplies, gamma sterilization of disposable laboratory supplies—petri dishes, pipettes, and test tubes—is also a fast-growing application. A promising and appealing application is food preservation, including the reduction of postharvest losses by the elimination of pests with irradiation. Other food irradiation effects can be accomplished at distinct irradiation doses (Table 2). Regulatory bodies in several countries approve the consump-
Effect
Irradiation dose
Inhibition of sprouting Disinfection Control of parasites Shelf-life extension Extended refrigerated storage Elimination of pathogens Reduction of microbial load Sterilization
6–15 krad (60–150 gray) 20–1000 krad (200–1000 Gy) 20–200 krad (200–2000 Gy) 200–500 krad (2000–5000 Gy) 200–500 krad (2000–5000 Gy) 300–700 krad (3000–7000 Gy) Up to 1 Mrad (10 kGy) Over 1 Mrad (10 kGy)
tion of over 20 irradiated foods. Extensive research on this subject has been done at the Bhabha Atomic Research Center in Trombay, India, where the need for bulk sterilization of food is more pressing. The biocidal effect of the gamma-irradiation process is effective for the control of microbiological contamination in many raw materials used by the pharmaceutical and cosmetic industries. Such materials include distilled water, starch powder, rice powder, lactose, talc, cocoa butter, inorganic pigments, petroleum jelly, and enzymes. Other applications include irradiation of male insects for pest control, sterilization of corks, sterilization of laboratory animal bedding, preparation of vaccines, degradation of poly(tetrafluoroethylene) scraps used in lubricants, cross-linking of polyethylene for shrink films, and production of wood-polymer and concrete-polymer composites. Cesium-137 is also used for sludge processing to produce fertilizer or cattle feed. It is available in large quantities in the reprocessing of fuel elements from nuclear reactors, and the process helps solve the problem of disposing of cesium from this source. See NUCLEAR FUELS REPROCESSING. Medical applications. Radioisotopes are used in the treatment of cancer by radiation. Encapsulated sources are used in two ways: the radioisotopes may be external to the body and the radiation allowed to impinge upon and pass through the patient (teletherapy), or the radiation sources may be placed within the body (brachytherapy). For teletherapy purposes, cobalt-60 is the most commonly used isotope, with source strengths ranging from several thousand curies up to 104 Ci (3.7 × 1014 becquerels). Some cesium-137 irradiators have been built, but cesium-137 radiation is not as penetrating as that from cobalt-60.
TABLE 1. Most commonly used radioisotopes and major applications
Isotope Radium-226 in equilibrium ( Gold-198 (198Au) Iridium-192 (192Ir) Cesium-137 (137Cs) Cobalt-60 (60Co) Iodine-125 (125I) Iodine-131 (131I)
Half-life 226
Ra)
1600 years 2.695 days 73.8 days 30.07 years 5.271 years 59.4 days 8.021 days
Average γ -ray energy, MeV 0.83 0.412 0.38 0.662 1.25 0.035 0.187, average β-ray energy
Main uses Medical (brachytherapy) Medical (brachytherapy—permanent implants) Medical (brachytherapy) Industrial, medical (brachytherapy), research irradiators Industrial, medical (teletherapy), research irradiators Medical (brachytherapy—permanent implants) Medical (dispersal technique)
Ixodides Historically, radium-226 has been used as the brachytherapy encapsulated source, but due to serious problems that can arise from a health physics aspect if the capsule breaks, radium is being replaced with cesium-137 sources. Iridium-172 is also used for brachytherapy applications. See HEALTH PHYSICS. In order to avoid the long half-life of radium, its daughter product radon-222, which is an inert gas, was used in very small glass or gold seeds. Today radon has been replaced by iodine-125 or gold-198 seeds, which are put directly in a tumor and permanently left in place. See RADIUM. For some medical applications the radioisotope is dispersed in the body; the most commonly used is iodine-131. The specific advantage of internal therapy with iodine-131 is that the thyroid gland concentrates the element iodine. When the radioisotope is administered either orally or intravenously in a highly purified form, it goes to the thyroid, where certain forms of thyroid disorders and cancers can be treated by the radiation. Unlike the other radioisotopes discussed here, the iodine-131’s therapeutic effectiveness depends upon the beta rays emitted, not the gamma rays. See IODINE. Another medical use for radioisotopes is the requirement that all transfusions of blood be irradiated before being given to certain patients so that most of the lymphocytes present are destroyed. For this purpose, self-contained irradiators are used. Cobalt-60 or cesium-137 encapsulated sources are permanently fixed and sealed off within the radiation shield, and the irradiation chamber is introduced into the radiation field by rotation. Such units are also extensively used for radiobiological and other research purposes. See RADIOACTIVITY AND RADIATION APPLICATIONS; RADIOLOGY. Peter R. Almond Bibliography. D. A. Bromley, Neutrons in science and technology, Phys. Today, 36(12):31–39, December 1983; Institute of Food Technologists, Food irradiation (a collection of articles), Food Technol., 27(2):38–60, February 1983; T. Ouwerkerk, An overview of the most promising industrial applications of gamma processing, International Symposium on Application and Technology of Ionizing Radiations, Riyadh, Saudi Arabia, March 1982; M. Satin, Food Irradiation: A Guidebook, 2d ed., 1996.
pairs of legs; nymphs and adults, four. The 600 or so known species are all bloodsucking, external parasites of vertebrates including amphibians, reptiles, birds, and mammals. Ticks are divided into three families, Argasidae, Ixodidae, and Nuttalliellidae. The last contains but one exceedingly rare African species, Nuttalliella namaqua, morphologically intermediate between the Argasidae and the Ixodidae. It is of no known importance, either medically or economically. Argasidae. Argasids, or the soft ticks, differ greatly from ixodids in that the sexes are similar; the integument of adults and nymphs is leathery and wrinkled; there is no dorsal plate, or scutum; the gnathosoma is ventral in adults and nymphs but anterior in larvae; and the spiracles are small and anterior to the hindlegs (Fig. 1). These ticks frequent nests, dens, and resting places of their hosts. Adults feed intermittently and eggs are laid a few at a time in niches where the females seek shelter. Larvae feed for a few minutes to several days, then detach and transform to nymphs which feed and molt several times before transforming to adults. Nymphs and adults are notably resistant to starvation; some are known to live 10 years or longer without feeding. The family contains about 85 species, with 20 in the genus Argas and 60 in Ornithodoros. Several are of medical or veterinary importance. Argus persicus, A. miniatus, A. radiatus, A. sanchezi, and A. neghmei are serious pests of poultry. Some of these species carry fowl spirochetosis, a disease with a high mortality rate. Larvae and nymphs of Otobius megnini, the spinose ear tick, feed deep in the ears of domesticated animals in many semiarid regions. Heavy infestations cause intense irritation which leads to unthriftiness and sometimes to death. The life history is unusual in that the adults do not feed. Ornithodoros moubata transmits
Ixodides A suborder of the Acarina, class Arachnida, comprising the ticks. Ticks differ from mites, their nearest relatives, in their larger size and in having a pair of breathing pores, or spiracles, behind the third or fourth pair of legs. They have a gnathosoma (or socalled head or capitulum), which consists of a base (basis capituli), a pair of palps, and a rigid, elongated, ventrally toothed hypostome which anchors the parasite to its host. They also have a pair of protrusible cutting organs, or chelicerae, which permit the insertion of the hypostome. The stages in the life cycle are egg, larva, nymph, and adult. The larvae have three
Fig. 1. Argasid tick, Ornithodoros coriaceus, shown enlarged.
501
502
Ixodides
Fig. 2. Ixodid tick, Dermacentor andersoni, female, shown enlarged.
relapsing fever in East, Central, and South Africa. It is a highly domestic parasite and humans are probably the chief host. Ornithodoros turicata, O. hermsi, O. talaje, and O. rudis are important vectors of relapsing fever in the Western Hemisphere, as is O. tholozani in Asia. The bites of most species that attack humans produce local and systemic reactions; the bites of some species, especially O. coriaceus of California and Mexico, are extremely venomous. See RELAPSING FEVER. Ixodidae. In contrast to argasids, Ixodidae have a scutum covering most of the dorsal surface of the male but only the anterior portion of females, nymphs, and larvae (Fig. 2). They are known as the hard ticks. The sexes are thus markedly dissimilar. The gnathosoma extends anteriorly, and the large spiracles are posterior to the hindlegs. Instead of frequenting nesting places, these ticks are usually more or less randomly distributed throughout their hosts’ environment. Larvae, nymphs, and adults feed but once, and several days are required for engorgement. The immature stages of most species drop to the ground for molting but those of the genus Boophilus
and a few others molt on the host. The female lays a mass containing up to 10,000 or more eggs. The life cycle is usually completed in 1–2 years. The family consists of about 11 well-defined genera with 500 species. Many species transmit disease agents to humans and animals; included are viruses, rickettsiae, bacteria, protozoa, and toxins. Transmission is by bite or by contact with crushed tick tissues or excrement. Virus diseases of humans include Colorado tick fever of the western United States, transmitted by Dermacentor andersoni, and Russian spring-summer encephalitis and related diseases in Europe and Asia, transmitted by Ixodes persulcatus. Some important rickettsial diseases of humans are spotted fever, widely distributed in the Western Hemisphere, of which D. andersoni, D. variabilis, Rhipicephalus sanguineus, and Amblyomma cajennense are important vectors; boutonneuse fever and related diseases of the Mediterranean region and Africa transmitted by R. sanguineus and some other species; and Q fever, the agent of which, although occurring in many ticks throughout the world, is not commonly transmitted directly by them. Tularemia (rabbit fever) is a bacterial disease transmitted by several species of ticks in the Northern Hemisphere. Tick paralysis, which is believed to be caused by toxins, occurs in many parts of the world and is produced by several species of ticks during the feeding process. The numerous tick-borne diseases of animals cause vast economic losses, especially in tropical and subtropical regions. Examples are babesiasis, a protozoan disease caused by species of Babesia including B. bigemina, the agent of the widely distributed Texas cattle fever, which is transmitted principally by species of Boophilus; East Coast fever of Africa, another protozoan disease, caused by Theileria parva, is carried by several species of Rhipicephalus. Aside from carrying disease, several species are extremely important pests of humans and animals. Heavy infestations of ticks produce severe anemia in livestock, and even death, from the loss of blood alone. See TICK PARALYSIS; TULAREMIA. Glen M. Kohls
J J/psi particle — Jute
J/psi particle 80
242 events 70
60 spectrometer:
50 events/ 25 MeV
An elementary particle with an unusually long lifetime or, from the Heisenberg uncertainty principle, with an extremely narrow width = 86.6 ± 6.0 keV, and a large mass m = 3096.93 ± 0.09 MeV. It is a bound state containing a charm quark and an anticharm quark. The discovery of the J/psi particle is one of the cornerstones of the standard model. Discovery. Prior to 1974, there was no theoretical justification, and no predictions existed, for longlived particles in the mass region 1–10 GeV/c2. The J/psi particles are rarely produced in protonproton (p-p) collisions. Statistically, they occur once after hundreds of millions of subnuclear reactions, in which most of the particles are “ordinary” elementary particles, such as the kaon (K), pion (π), or proton (p). Searches for the J/psi particle are conducted by detecting its electron-positron (e+e−) decays. A two-particle spectrometer was used to discover this particle. A successful experiment must have: (1) a very high intensity incident proton beam to produce a sufficient amount of J particles for detection; and (2) the ability, in a billionth of a second, to pick out the J/psi → e−e+ pairs amidst billions of other particles through the detection apparatus. See PARTICLE ACCELERATOR. The detector is called a magnetic spectrometer. A positive particle and a negative particle each traversed one of two 70-ft-long (21-m) arms of the spectrometer. The e+ and e− were identified by the fact that a special counter, called a Cerenkov counter, measured their speed as being slightly greater than that of all other charged particles. Precisely measured magnetic fields bent them and measured their energy. Finally, as a redundant check, the particles plowed into high-intensity lead-glass and the e+ and e− immediately transformed their energy into light. When collected, this light “tagged” these particles as the e+ and e−, and not heavier particles such as the π, K, or p. The simultaneous arrival of an e− and
at normal magnetic field at –10% reduced magnetic field
40
30
20
10
2.5
3.0 me +e − , GeV
3.5
On-line data from August and October 1974, showing existence of J/psi particle. (After J. J. Aubert et al., Discovery of the new particle, J. Nuc. Phys., B89(1):1–18, 1975)
504
Jade e+ in the two arms indicated the creation of highenergy light quanta from nuclear interactions. The sudden increase in the number of e+e− pairs at a given energy (or mass) indicated the existence of a new particle (see illus.). See CERENKOV RADIATION; PARTICLE DETECTOR. Properties. Since its discovery in 1974, more than 109 J/psi particles have been produced. More than 100 different decay modes and new particles radiating from the J/psi particle have been observed. The J/psi particle has been shown to be a bound state of charm quarks. The long lifetime of the J/psi results from its mass being less than the masses of particles which separately contain a charm and an anticharm quark. This situation permits the J/psi to decay only into noncharm quarks, and empirically this restriction was found to lead to a suppression of the decay rate resulting in a long lifetime and narrow width. The subsequent discovery of the b quark and the intermediate vector bosons Z 0 and W ±, and studies of Z 0 decays into charm, b, and other quarks, show that the theory of the standard model is in complete agreement with experimental data to an accuracy of better than 1%. See CHARM; ELEMENTARY PARTICLE; INTERMEDIATE VECTOR BOSON; QUARKS; STANDARD MODEL; UPSILON PARTICLES. Samuel C. C. Ting Bibliography. O. Adriani et al., Results from the L3 experiment, Phys. Rep., 236:1–146, 1993; J. J. Aubert et al., Phys. Rev. Lett., 33:1404 and 1624, 1974; E. J. Augustin et al., Phys. Rev. Lett., 33:1406, 1974; W. Braunschweig et al., Phys. Lett., 53B:393, 1974; K. Hikasa et al., Review of particle properties, Phys. Rev. D, vol. 45, no. 11, pt. 2, 1992.
Jade A name that may be applied correctly to two distinct minerals. The two true jades are jadeite and nephrite. In addition, a variety of other minerals are incorrectly called jade. Idocrase is called California jade, dyed calcite is called Mexican jade, and green grossularite garnet is called Transvaal or South African jade. The most commonly encountered jade substitute is the mineral serpentine. It is often called new jade or Korean jade. The most widely distributed and earliest known true type is nephrite, the less valuable of the two. Jadeite, the most precious of gemstones to the Chinese, is much rarer and more expensive. Nephrite. Nephrite is one of the amphibole group of rock-forming minerals, and it occurs as a variety of a combination of the minerals tremolite and actinolite. Tremolite is a calcium-magnesium-aluminum silicate, whereas iron replaces the magnesium in actinolite. Although single crystals of the amphiboles are fragile because of two directions of easy cleavage, the minutely fibrous structure of nephrite makes it exceedingly durable. It occurs in a variety of colors, mostly of low intensity, including medium and dark green, yellow, white, black, and blue-gray. Nephrite has a hardness of 6 to 61/2 on Mohs scale, a specific gravity near 2.95, and refractive indices of 1.61 to 1.64. On the refractometer, nephrite gemstones
show a single index near 1.61. Nephrite occurs throughout the world; important sources include Russia, New Zealand, Alaska, several provinces of China, and a number of states in the western United States. See AMPHIBOLE; GEM; TREMOLITE. Jadeite. Jadeite is the more cherished of the two jade minerals, because of the more intense colors it displays. It is best known in the lovely intense green color resembling that of emerald (caused by a small amount of chromic oxide). In the quality known as imperial jade, the material is at least semitransparent. White, green and white, light reddish violet, bluish violet, brown, and orange colors are also found. Jadeite also has two directions of easy cleavage, but a comparable random fibrous structure creates an exceptional toughness. Although jadeite has been found in California and Guatemala, the only important source of gem-quality material ever discovered is the Mogaung region of upper Burma. The hardness of jadeite is 61/2 to 7, its specific gravity is approximately 3.34, and its refractive indices are 1.66 to 1.68; on a refractometer, only the value 1.66 is usually seen. See JADEITE. Richard T. Liddicoat, Jr.
Jadeite The monoclinic sodium aluminum pyroxene, NaAlSi2O6. Free crystals are rare. Jadeite usually occurs as dense, felted masses of elongated blades or as fine-grained granular aggregates. It has a Mohs hardness of 6.5 and a density of 3.25–3.35. It has a vitreous or waxy luster, and is commonly green but may also be white, violet, or brown. Jadeite exhibits the 93◦ and 87◦ pyroxene cleavages and a splintery fracture. It is extremely tough. Jadeite is always found in metamorphic rocks. It is associated with serpentine at Tawmaw, Burma; Kotaki, Japan; and San Benito County, California. It occurs in metasedimentary rocks of the Franciscan group in California and in Celebes. It is also found in Tibet; Yunan Province, China; and Guatemala. In addition, the jadeite “molecule” is present in high concentrations in solid solution in omphacite, a dark green pyroxene which, like garnet, is a major constituent of eclogite, a high-pressure, hightemperature metamorphic rock formed in the deep crust or upper mantle. Eclogites are found in the Bessi district, Japan; Mampong, Ghana; Gertrusk, Austria; Sonoma County, California; and as nodules in South African kimberlite pipes. See ECLOGITE; SOLID SOLUTION. Jadeite’s composition is intermediate between that of nepheline and albite, but jadeite does not crystallize in the binary system NaAlSiO4–NaAlSi3O8, or in the ternary system Na2O–Al2O3–SiO2, at low pressures. Its high density compared with the above tektosilicates and its presence in eclogite suggest that it is a high-pressure phase. This was confirmed by high pressure experiments on stability relationships between jadeite, nepheline, and albite (see illus.). Pure jadeite usually occurs with minerals indicative
Jahn-Teller effect
temperature, °C
1000 albite + nepheline
jadeite
500
0
5
10
15 20 pressure, kbar
25
30
Equilibrium curve for the reaction albite + nepheline → 2 ◦ ◦ jadeite. 1 kilobar = 10 pascals. F = ( C × 1.8) + 32.
of relatively low temperatures (390–570◦F or 200– 300◦C) such as analcite, lawsonite, chlorite, glaucophane, stilpnomelane, and zeolites. Jadeite is valued as a precious stone for carvings. See HIGH-PRESSURE MINERAL SYNTHESIS; JADE; PYROXENE. Lawrence Grossman
Jahn-Teller effect A distortion of a highly symmetrical molecule, which reduces its symmetry and lowers its energy. The effect occurs for all nonlinear molecules in degenerate electronic states, the degeneracy of the state being removed by the effect. It was first predicted in 1937 by H. A. Jahn and E. Teller. In early experimental work, the effect often “disappeared” or was masked by other molecular interactions. This has surrounded the Jahn-Teller effect with a certain mystery and allure, rarely found in science today. However, there are now a number of clear-cut experimental examples which correlate well with theoretical predictions. These examples range from the excited states of the most simple polyatomic molecule, H3, through moderate-sized organic molecules, like the ions of substituted benzene, to complex solid state phenomena involving crystals or localized impurity centers. See DEGENERACY (QUANTUM MECHANICS); MOLECULAR STRUCTURE AND SPECTRA; QUANTUM MECHANICS. Square planar molecule example. The fact that electronic degeneracy may destroy the symmetry upon which it is based is most easily demonstrated for molecules with fourfold symmetry. Consider a square planar molecule (with fourfold symmetry, D4h) with an unpaired electron in an orbitally degenerate state (for example, Eu). This state will be characterized by two orthogonal wavefunctions ψ A and ψ B and electron density distributions with mutually perpendicular nodal planes, as shown schematically in Fig. 1a and b. As long as the nuclei are undisplaced, the molecules in Fig. 1a and 1b are geometrically congruent and the energy of the system will be the same, regardless of whether the electron is in orbital ψ A or ψ B, that is, EA(0) = EB(0). Imagine the molecule undergoing an in-plane bending vibration (B2gν 4) and consider the nuclei displaced an incre-
505
ment +q along the appropriate normal coordinate Q. It is easily seen that the positions of the nuclei with respect to the electronic distributions characterized by ψ A and ψ B will not be equivalent, and the energies of the states A and B will differ, as shown on the right side of the potential energy diagram, Fig. 1c. The same argument will, of course, hold for the opposite phase of the vibration and displacement −q as shown on the left side of Fig. 1c. It is easily seen that the molecule in Fig. 1a with displacement + q is geometrically congruent with the molecule in Fig. 1b for displacement −q. It can therefore be deduced that for the electronic energies EA(q) = EB(−q) and similarly EA(−q) = EB(q). The establishment of these five points determines the general form of the JahnTeller potential curve in Fig. 1c. The potential surfaces of the states A and B must cross. The crossing point at zero displacement, with the corresponding energy E(0), is clearly not an absolute minimum; therefore, the totally symmetric square configuration will not be a stable equilibrium for the degenerate electronic state. Therefore, at equilibrium, the molecule will be distorted from a square, and its energy will be lowered. See MOLECULAR ORBITAL THEORY. The above arguments are not restricted to square molecules. With the exception of linear molecules which suffer Renner-Teller effects, all polyatomic molecules of sufficiently high symmetry to possess
q
+
−q +
+
+
+
−
−q
−
(b)
Q
B
EA( − q)
EB( − q)
(c)
+
− −
−
(a)
+
−
−
−
+
q
−q
Q
E
EB(q)
A
EA(q)
+q
Q
Fig. 1. Jahn-Teller effect for a square planar molecule. Displacement of the nuclei of the molecule in the electronic field provided by (a) the wavefunction ΨA and (b) ΨB. (c) Energy E corresponding to ΨA and ΨB as a function of the displacement coordinate Q.
506
Jahn-Teller effect orbitally degenerate electronic states will be subject to the Jahn-Teller instability. However, in cases other than the fourfold symmetry discussed above, the proof is somewhat more involved and requires the use of the principles of group theory. See GROUP THEORY; RENNER-TELLER EFFECT. Vibrational structure. The distortion of the electronic potential surface of a degenerate electronic state by the Jahn-Teller effect has been considered. The nuclear motion on such a distorted potential surface leads to perturbations in the vibrational structure. These perturbations result in shifts and splittings of the vibrational levels. While group theory, as noted above, can demonstrate the susceptibility of a given molecule to Jahn-Teller instability, it gives no information about the magnitude of the effect; such information must come from experiments. The hexagonal benzene molecule, C6H6, and its symmetrically substituted derivatives, represent almost ideal examples of highly symmetrical molecules. Their least tightly bound electrons reside in a pair of degenerate π orbitals. The removal of one of these electrons produces a benzene positive ion whose ground state is electronically degenerate and so fulfills the criteria for a Jahn-Teller effect. These ions have been studied extensively experimentally and represent excellent examples with which to illustrate vibrational effects. For molecules with the sixfold symmetry (D6h) of a regular hexagon, group theoretical arguments demonstrate that Jahn-Teller distortions of the nuclei occur for motions along four doubly degenerate e2g vibrational normal modes denoted ν 15–ν 18. These are shown schematically in Fig. 2, where the arrows represent the cartesian components of the atomic displacements. The three-dimensional form of the potential energy E with respect to any one of these doubly degenerate coordinates Qi is shown in
ν15
(a)
ν16
(b)
ν17
(c)
ν18
(d)
Fig. 2. The four e2g vibrational modes which are Jahn-Teller-active for the positive ion of any substituted benzene with D6h symmetry: (a) C-C stretch, (b) C-F stretch, (c) C-C-C bend, and (d) C-F bend.
Fig. 3a; Fig. 3b shows cuts through this type of surface assuming various magnitudes for the Jahn-Teller distortion and illustrates how the introduction of the Jahn-Teller effect changes the potential surface. On the far left with no Jahn-Teller effect, the doubly degenerate electronic potential has a minimum at the symmetrical configuration (a regular hexagon). As the Jahn-Teller effect increases, the electronic potentials split and lowered minima (Jahn-Teller stabilization) in the potential occur at distorted nuclear configurations. The Jahn-Teller effect is usually characterized in terms of a linear parameter D defined by D = EJT/ω, where EJT is the Jahn-Teller stabilization energy (the separation between the central maximum and the distorted minima) and ω is the energy of the vibrational mode leading to the instability. As long as only the linear Jahn-Teller effect (wherein only terms linear in the displacement coordinate appear in the hamiltonian) is considered, the distorted potential of the D6h species in Fig. 3a exhibits an equipotential moat around the central maximum. The quadratic Jahn-Teller effect (caused by terms quadratic in the displacement coordinate) produces the threefold local minima and saddle points illustrated in Fig. 3a, and obviously requires the introduction of an additional coupling parameter, analogous to D, for its characterization. For small distortion when D is much less than 1 and quadratic terms are insignificant, approximate formulas describing the vibronic levels have been derived. For the general case when the distortion is large (D approximately 1 or greater) or when terms quadratic in Q are important, no explicit formulas can usually be given. More sophisticated techniques are needed to calculate the vibrational structure, and usually involve the diagonalization of large matrices. Such analysis shows that for the substituted benzenoid cations mentioned above the linear JahnTeller parameter D is not greater than about 1.0, while quadratic and higher terms are almost negligible. The energy reduction, that is, the Jahn-Teller stabilization energy EJT, is about 2–3 kcal/mole (8–12 kJ/mole) for these ions. This energy reduction is obtained with the benzene ring distorted from a regular hexagon by 2–3◦ in its internal angles, and with the C-C bond lengths differing by about 4 picometers. Static and dynamic effects. The terms static and dynamic Jahn-Teller effect are frequently used, but unfortunately their use is not standardized. The term static Jahn-Teller effect is often used to describe the distortion of the electronic potential as seen in Figs. 1c and 3, with corresponding use of dynamic JahnTeller effect to describe vibrational motion on this surface. However, other uses of the terms static and dynamic involve the magnitude of the Jahn-Teller effect and the time scale of the experiments used to measure its properties. As long as the distortion is small (D less than 1), all vibrational levels ν will be located above the central maximum (Fig. 3b). On the time scale of most experiments, a time-averaged, symmetrical configuration will be sampled, and the
Jahn-Teller effect term dynamic Jahn-Teller effect is therefore often applied. For a sufficiently large distortion (D greater than 1), one or more vibrational levels may be localized in separate, disjoint minima, with interconversions between them being slow. This situation (D greater than 1) correspondingly would be referred to as the static Jahn-Teller effect. The distortion will therefore manifest itself as a static or dynamic effect depending on the time scale of the experiment. Further complication arises when the Jahn-Teller active mode is itself degenerate, as in the case of the e2g vibrations of Fig. 2. In this case the switching between the potentials U and U of Fig. 3 can occur by pseudorotation within the moat, without ever surmounting the symmetric maximum. Then large quadratic terms, resulting in deep minima around the moat, are required to produce a really static effect. Solids. While the Jahn-Teller effect was originally postulated for highly symmetric discrete molecules, it can also appear in crystals and other highly symmetric extended systems. Searches for the observable consequences of the Jahn-Teller effect in molecular spectra were initially hampered by the fact that typically only excited electronic states of stable molecules or states of transient species—molecular ions and free radicals—possess the orbital degeneracy required for its observation. Most of the early experimental observations therefore involved optical or microwave spectra of point defects and impurity centers in solids. See CRYSTAL DEFECTS. In treating these systems, usually only the impurity or defect and its nearest-neighbor atoms are considered, and the rest of the solid is regarded as a featureless heat bath. As an example, Fig. 4a shows schematically a simple cluster, consisting of an electron in a p state trapped at a negative ion vacancy in an alkali halide (a so-called F-center), surrounded by a regular octahedron of its nearest neighbor positive ions. In the undistorted lattice, the three possible states, px, py, and pz, will be obviously degenerate. Now suppose the cluster undergoes an asymmetric vibration and the ions start moving in the directions indicated by the arrow. It is easily seen that the energy of the pz state will be initially lowered, since the positive ions will approach more closely the negative lobes of the pz wave function, while, by a similar argument, those of the py and px states will be raised (Fig. 4b). The surrounding crystal will counterbalance the distortion, with the corresponding energy increase being, by Hooke’s law, initially quadratic in displacement. As in the molecular case, it is easy to see that an energy minimum for the Fcenter electron will not be in the symmetric configuration but at some finite displacement. See COLOR CENTERS. Besides the optical and microwave spectra, the consequences of the distorted geometry are also observable in electron spin resonance and Zeeman spectra. When, for instance, a magnetic field H is applied parallel to the z axis of Fig. 4a, the degeneracy of the p state is split into three states with energies of approximately −βH, corresponding to the three possible angular momentum pro-
507
E
U′
U″
Q2
Q1
(a)
V 5
V 5
4
V
4
3
6 5
3
2
4
2
1
1
0
3 2
0
D=0 (b)
D = 0.5
1 0
D = 2.5 Fig. 3. Potential energy for a doubly degenerate, Jahn-Teller active vibration of a molecule with D6h or D3h symmetry. (a) Potential energy surface. (b) Cuts through the surface for increasing magnitude of the Jahn-Teller effect going from left to right. Horizontal lines represent vibrational levels, which are numbered with the index V for the undistorted molecule.
jection eigenvalues of Lx = +1, 0, −1, with β the Bohr magneton. In the Jahn-Teller distorted case, the electron can no longer rotate freely, it has to “drag” its distortion with it. This increases its effective mass and quenches its orbital contribution to the magnetic splitting. Such reductions have been observed in the Zeeman spectra of numerous transitionmetal ions in crystals. The magnitude of this reduction is dependent on the Jahn-Teller stabilization energy. At absolute zero (0 K) it was shown to be approximately exp (−3EJT/2ω). See ELECTRON PARAMAGNETIC RESONANCE (EPR) SPECTROSCOPY; ZEEMAN EFFECT. In some ions the electron-lattice interaction is so weak that even though their ground states are orbitally degenerate, Jahn-Teller effects are ordinarily not detected when they result from isolated impurities. In crystals containing large concentrations
508
Jamming +
Jamming Intentional generation of interfering signals in the electromagnetic and infrared spectrum by powerful transmitters as a countermeasure intended to block a communication, radar, or infrared system or to impair its effectiveness appreciably. Radio broadcasts or radio messages can be jammed by beaming a more powerful signal on the same frequency at the area in which reception is to be impaired, using carefully selected noise modulation to give maximum impairment of intelligibility of reception. When stations on many different frequencies are to be jammed or when an enemy is changing frequencies to avoid jamming, the jamming transmitter is correspondingly changed in frequency or swept through a range of frequencies over and over again. Techniques similar to this are also used at radar frequencies to jam early-warning and gunfire-control radar systems. See ELECTRONIC WARFARE. John Markus; Paul J. DeLia
+
pz py +
px
+
px
py +
(a)
pz
+
px, py
Jasper E
Q
EJT
pz
(b) Fig. 4. Jahn-Teller effect in an F-center. (a) The p orbitals of the F-center surrounded by an octahedron of positive ions. (b) The potential energy E of the p orbitals as the positive ions are displaced, as a function of displacement coordinate Q.
of these ions, the ion-ion interactions can greatly enhance these effects. Many rare-earth compounds have been found to undergo second-order phase transitions in which the overall crystal symmetry is reduced and the electronic degeneracy removed in a manner expected for the Jahn-Teller effect. Such distortions, involving cooperation of many ions, are usually called the cooperative Jahn-Teller effect. These minute distortions may cause shifts in phonon modes, which are observable in Raman spectra or neutron diffraction experiments. See NEUTRON DIFFRACTION; RAMAN EFFECT. V. E. Bondybey; Terry A. Miller Bibliography. I. B. Bersuker, The Jahn-Teller Effect, 2006; C. C. Chancey and M. C. M. O’Brien, The Jahn-Teller Effect in C60 and Other Icosahedral Complexes, 1997; R. Englman, The Jahn-Teller Effect in Molecules and Crystals, 1972; T. A. Miller and V. E. Bondybey (eds.), Molecular Ions: Spectroscopy, Structure and Chemistry, 1983; Yu. Perlin and M. Wagner, Modern Problems in Condensed Matter Sciences: The Dynamical Jahn-Teller Effect in Localized Systems, 1985.
An opaque, impure type of massive fine-grained quartz that typically has a tile-red, dark-brownishred, brown, or brownish-yellow color. The color of the reddish varieties of jasper is caused by admixed, finely divided hematite, and that of the brownish types by finely divided goethite. Jasper has been used since ancient times as an ornamental stone, chiefly of inlay work, and as a semiprecious gem material. Under the microscope, jasper generally has a fine, granular structure, but fairly large amounts of fibrous or spherulitic silica also may be present. Jasper has a smooth conchoidal fracture with a dull luster. The specific gravity and hardness are variable, depending upon particle size and the nature and amount of the impurities present; both values approach those of quartz. The color of jasper often is variegated in banded, spotted, or orbicular types. Heliotrope is a translucent greenish chalcedony containing spots or streaks of opaque red jasper, and jaspagate contains bands of chalcedonic silica alternating with jasper. Jaspilite is a metamorphic rock composed of alternating layers of jasper with black or red bands of hematite. See CHALCEDONY; GEM; QUARTZ. Clifford Frondel
Jaundice The yellow staining of the skin and mucous membranes associated with the accumulation of bile pigments in the blood plasma. Bile pigments are the normal result of the metabolism of blood pigments, and are normally excreted from the blood into the bile by the liver. An increase in circulating bile pigments can, therefore, come about through increased breakdown of blood (hemolytic jaundice), through lack of patency of the bile ducts (obstructive jaundice), through inability or failure of the liver to clear
Jaundice the plasma (parenchymal jaundice), or through combinations of these. See GALLBLADDER; LIVER. Metabolic pathway. Metabolism of the hemoglobin from destroyed red blood cells is carried on in organs of the reticuloendothelial system, such as the spleen, and the resulting bilirubin is liberated to the plasma. The plasma then circulates through the liver, where the bilirubin is conjugated enzymatically with glucuronic acid, and is excreted in the bile. Bile travels through the bile ducts to the small intestine, whence a small amount of altered bilirubin, termed urobilinogen, may be reabsorbed into the plasma. Excessive destruction of red blood cells causes accelerated production of bilirubin, overloads the ability of the liver to remove the pigment from the circulation, and produces jaundice. Blockage of the bile ducts causes elevation of plasma bilirubin glucuronide level because of the inability to dispose of this material in the usual channel. Damage to liver cells may cause elevation of the plasma bilirubin or bilirubin glucuronide or both, depending on the type and severity of liver cell damage. Although the major portion of circulating bilirubin is derived from the breakdown of red cells, some is also contributed through inefficient or incomplete utilization of precursors of hemoglobin, which spill into the plasma bilirubin pool without having been used for hemoglobin synthesis. In addition, metabolism of cytochromes and similar pigmented compounds found in all cells of the body yields small quantities of bilirubin. See HEMOGLOBIN. Jaundice occurs when the level of these circulating pigments becomes so high that they are visible in the skin and mucous membranes, where they are bound by a reaction which has not been identified. In the normal adult, levels of total bilirubin, that is, the total bilirubin and bilirubin glucuronide, rarely exceed 0.8–1.0 mg/100 ml of plasma, while jaundice usually becomes visible when total bilirubin approaches 1.5 mg. See BILIRUBIN. Hemolytic jaundice. Destruction of red blood cells in the normal human adult proceeds at a rate at which about 0.8% of the circulating hemoglobin is broken down each day. This can be increased in states of excessive hemolysis up to 10- to 15-fold without overtaxing the remarkable ability of the liver to clear bilirubin from the plasma. Even this rate of clearing can be exceeded, however, in certain morbid states which result in hemolytic jaundice; such states include various hemolytic anemias, hemolysis resulting from incompatible blood transfusion, severe thermal or electric injuries, or introduction of hemolytic agents into the bloodstream. Similar jaundice occurs in pulmonary infarction. In infants, and especially in premature infants, the ability of the liver to conjugate bilirubin with glucuronide is much less than in adults, apparently because of the lack of suitable enzymes. Jaundice appears in many infants shortly after birth, then disappears within a few days, with development of the appropriate enzyme structure. In the uncommon constitutional hepatic dysfunction, this enzyme defect apparently persists into adult life. The infantile
jaundice accompanying erythroblastosis fetalis (Rh babies) is due to the inability of the infantile liver to metabolize the bilirubin resulting from markedly accelerated hemolysis. See BLOOD GROUPS; RH INCOMPATIBILITY. A related form of jaundice occurs when an abnormality exists in the hemoglobin-forming cells of the bone marrow in which inadequate utilization of hemoglobin precursors occurs. Moderate degrees of jaundice can result from the accumulation of these substances in the bloodstream. Obstructive jaundice. The highest levels of total bilirubin are seen in chronic obstructive jaundice, in which plasma levels may reach 50–60 mg/ 100 ml, and the skin may take on a remarkable deepyellow hue. This condition may be brought about through a variety of means. In the infant there may be a severe maldevelopment of the bile ducts such that no channel for the flow of bile exists, while in the adult obstructive jaundice is most commonly caused by impaction of a gallstone in the ducts. Benign and malignant tumors of the gallbladder, bile ducts, pancreas, lymph nodes, and other organs may also cause compression of the bile ducts with loss of patency, and bile duct stricture may follow surgery or inflammation in the region. A similar, less severe, reversible picture is seen as a hypersensitivity response to the administration of some drugs, the most common of which are chlorpromazine and related drugs, and methyl testosterone. In the uncommon benign disorder known as idiopathic familial jaundice, there appears to be decreased ability to excrete conjugated bilirubin into the bile ducts, giving rise to a constant, but usually quite slight, elevation in the plasma bilirubin glucuronide. See GALLBLADDER DISORDERS. Parenchymal jaundice. A wide variety of diseases exists in which part of the jaundice can be accounted for by actual damage to liver cells, with consequent decrease in their ability to conjugate bilirubin and excrete the glucuronide, causing an elevation of both fractions in the plasma. This group comprises such conditions as inflammations of the liver, including viral hepatitis, Weil’s disease, syphilis, parasitic infestations, and bacterial infections; toxic conditions, including poisoning from a wide variety of organic and inorganic compounds and in a broader sense the toxemias associated with severe systemic diseases; tumorous conditions, including primary hepatic tumors and those metastatic from other organs; and other conditions, the most common of which is congestive heart failure. Some of these conditions have an added component of obstructive or hemolytic jaundice, which confuses the picture for the clinician. Symptomatology. The appearance and symptomatology of subjects suffering from jaundice vary from case to case and depend on the underlying disease process producing the jaundice. Elevation of bilirubin by itself has very limited deleterious effects on overall physiology, with two exceptions: The brain of very young infants is subject to damage by high levels of circulating bilirubin (a condition termed
509
510
Jawless vertebrates kernicterus); and, under certain conditions in the adult, high levels of circulating bilirubin appear to contribute to kidney damage. Aside from the effects of the jaundice itself, however, individuals with hemolytic jaundice usually have an accompanying anemia. Those with obstructive jaundice commonly note that their stools are not brown, a symptom caused by lack of bile pigments (acholic stools), while bilirubin glucuronide appears in the urine and causes it to turn dark. Persons with malignancies demonstrate the usual signs of weight loss and anemia, while those with inflammatory conditions commonly have fever, chills, and prostration. See ANEMIA; PIGMENTATION. Rolla B. Hill
Jawless vertebrates The common name for members of the agnathans. Jawless vertebrates include the cyclostomes (modern lampreys and hagfishes) as well as extinct armored fishes, known colloquially as ostracoderms (“bony-skinned”), that lived in the Ordovician, Silurian, and Devonian periods (Fig. 1). Agnathans have pouchlike gills opening through small pores, rather than slits as in jawed vertebrates (Gnathostomata). Primitively, agnathans lack jaws, they show a persistent notochord, and most have no paired fins. See CYCLOSTOMATA (CHORDATA). Types. The lampreys (Petromyzontiformes; 38 extant species) and hagfishes (Myxiniformes; ∼70 species) are scaleless, eel-shaped fishes with round mouths, inside which are keratinized teeth carried upon a complex tongue and oral hood. Parasitic lampreys attach to host fishes by a sucker and use the tongue to rasp away flesh and blood. Hagfishes are exclusively marine, are blind, and live most of their lives buried in mud, emerging to eat
polychaete worms, mammal carcasses, or dead or dying fishes. Lampreys and hagfishes show a bipolar distribution (they occur in both Northern and Southern hemispheres, but only 4 species occur in the Southern) and prefer cool waters. See MYXINIFORMES; PETROMYZONTIFORMES. Ostracoderms were very variable in shape and were covered with a superficial bony armor made up of solid shields or scales. One of the best known was the osteostracans (cephalaspids) of Eurasia and North America, with a solid semicircular head shield pierced by dorsally placed eyes and a small circular mouth on the undersurface. The head shield is also marked by sensory fields (lacunae filled with small plates) which were specialized parts of the lateralline system. Osteostracans probably lived on, or were partly buried in, mud and sand, where they sucked in small food particles. See CEPHALASPIDOMORPHA; OSTEOSTRACI; OSTRACODERM. Galeaspids were superficially like osteostracans and probably occupied a similar ecological niche in China. The head shield in many was extended into an elaborate snout. Heterostracans (pteraspids) were torpedo-shaped active swimmers that lived alongside osteostracans. The head was encased in large dorsal and ventral shields which surrounded small, laterally directed eyes. The mouth, formed by a lower lip equipped with small bony plates, may have been extended as a scoop to ingest food particles. A few ostracoderms superficially look like heterostracans and sometimes are classified with them because they show a similar type of bone which lacks enclosed osteocytes (bone cells). Heterostracans are the oldest of completely preserved vertebrates and lived in seas 460 million years ago. See HETEROSTRACI. The Thelodonti are a group of small ostracoderms in which the body is completely covered with
s
s
s
he
is gf
ha
la
rey mp
ts
on
co
d no
ida
p as
An
ti
a
on
Th
d elo
ci
str
ro ete
H
G
i
ac
ida
str
sp
a ale
ate
r
d we
o ste
ve
ja
O
OSTRACODERMS AGNATHANS paired pectoral fins bony braincase CYCLOSTOMES
scales and/or bony shields hard tissues formed of calcium phosphate beginnings of true vertebral column, large eyes complex brain and sensory organs Interrelationships of jawless and jawed vertebrates, showing distribution of some major evolutionary steps.
r teb
Jerboa minute, finely sculptured scales. These scales are found worldwide and in great abundance in Upper Silurian and Lower Devonian rocks. Consequently, they have been used for stratigraphic zonation and correlation. Some scientists think that thelodonts may be closely related to sharks and rays (Chondrichthyes). See CHONDRICHTHYES. Interrelationships. Modern ideas of the interrelationships of agnathan fishes suggest that they are not a natural group; that is, some such as the osteostracans and galeaspids are more closely related to jawed vertebrates than to other agnathans. Similarly, lampreys share many specializations with jawed vertebrates not seen in hagfishes, such as neural arches along the notochord, eye muscles, nervous control of heartbeat, and the capability to osmoregulate and adapt to freshwater. These attributes suggest that lampreys are more closely related to jawed vertebrates than to hagfishes and that the Cyclostomata are not a natural group. Recently, Conodonta have been classified as vertebrates. Conodonts are preserved as tiny, phosphatic, toothlike elements found in marine deposits from the Cambrian to Triassic. Usually found in great abundance, they are useful for dating and correlating rocks. Rare fossils of soft-bodied, eel-shaped animals show that conodonts are part of complex feeding devices within the mouth. They grew to at least 5 cm (2 in.) (some incompletely known forms must have been considerably longer) and had large eyes, suggesting an active predatory lifestyle. See CONODONT. A classification of jawless vertebrates is given below: Subphylum Craniata (complex brain and sensory organs) Class Hyperotreti Order Myxiniformes Class Vertebrata (with neural arches in backbone) Order: Petromyzontiformes Conodonta Anaspida Thelodonti Heterostraci (pteraspids) Galeaspida Osteostraci (cephalaspids) Peter L. Forey Bibliography. P. C. J. Donoghue et al., Conodont affinity and chordate phylogeny, Biol. Rev., 75:191– 251, 2000; P. L. Forey and P. Janvier, Evolution of the early vertebrates, Amer. Sci., pp. 554–565, November–December 1994; P. Janvier, Early Vertebrates, Clarendon Press, Oxford, 1996.
TABLE 1. Genera and common names of jerboas Genus Cardiocranius Salpingotus Salpingotulus Paradipus Euchoreutes Dipus Eremodipus Jaculus Stylodipus Allactaga Allactodipus Alactagulus Pygeretmus
Five-toed dwarf jerboa (1 species) Three-toed dwarf jerboas (5 species) Baluchistan pygmy jerboa (1 species) Comb-toed jerboa (1 species) Long-eared jerboa (1 species) Rough-legged jerboa, or northern three-toed jerboa (1 species) Central Asian jerboa (1 species) Desert jerboas (4 species) Thick-tailed three-toed jerboas (3 species) Four- and five-toed jerboas (11 species) Bobrinski’s jerboa (1 species) Lesser five-toed jerboa, or little earth hare (1 species) Fat-tailed jerboas (2 species)
illustration). The rounded tail is usually about 2 in. (50 mm) longer than the head and body. The ears are large, equaling half the length of the head and body in some species. The eyes are large. Coloration varies but is usually some shade of buff (moderate orangeyellow) mixed with black or russet; the underparts are white. Of all the rodents, this kangaroolike mammal is perhaps the most highly developed for getting around on two feet. The hindlimbs are large and very long (at least four times the length of the front limbs). In most species the three main foot bones are fused into a single “cannon bone” for greater strength. The front limbs are considerably smaller and are used primarily for grooming and for holding the grass seeds on which many species feed. Jerboas prefer to stand up on their hindlegs and hop, covering 2–6 ft (0.6–1.8 m) in a single jump. When alarmed, a jerboa can travel by leaps and bounds of 5–10 ft (1.5–3 m), considerably faster than a human can run. Allactaga elater has been timed at a speed of 48 km/h (30 mi/h). The tail, which ends in a tuft of hair, not only gives the jerboa support when it is standing but also helps the animal maintain its balance as it leaps along. The thick hairs on the feet absorb some of the landing shock and provide traction on the sand.
Jerboa The common name for 33 species of rodents in the family Dipodidae (see table). Jerboas inhabit the hot arid desert and semiarid regions of Turkey, North Africa, and Asia. See DESERT; RODENTIA. Description. Jerboas are rodents ranging 2–8 in. (50–200 mm) in head and body length (see
Common name
Typical jerboa.
511
512
Jet (gemology) The largest of all jerboas is the four-toed jerboa (Allactaga major). Head and body length may range 90–263 mm (3.5–10 in.), tail length is 142–308 mm (5.5–12 in.), and weight is 280–420 grams (9.75– 14.75 oz). Behavior. Jerboas are nocturnal in order to escape the intense heat of the day. They spend the day holed up in their underground den. When they retire for the day, they close up the burrow entrance to keep out the hot sunlight. Four types of burrows are utilized by different species of jerboas: temporary summer day burrows, temporary summer night burrows, permanent summer burrows used as living quarters and for producing young, and permanent winter burrows for hibernation. Temporary shelters during the day may consist of a long but simple tunnel up to 13 m (42 ft) in length and 20–30 cm (8– 12 in.) deep. Some species are solitary, while others are more social. Jerboas in Russia, Ukraine, and Central Asia may hibernate for 6–7 months, whereas jerboas in more southern regions may not hibernate at all. Some species feed primarily on seeds and other plant material such as bulbs, roots, and rhizomes (underground horizontal stems); others are mainly insectivorous. Some fluids are obtained through their food, but jerboas also produce fluid internally as a result of metabolic processes (“metabolic water”). See METABOLISM. Breeding extends from about March to October with females giving birth to two or three litters annually. Most litters contain three to five young. The gestation period is 25–40 days. Females of some species attain sexual maturity in 31/2 months; therefore, a female born early in the breeding season could give birth to her own litter during the same breeding season. Several species of jerboas are classified as endangered, threatened, or vulnerable by the International Union for the Conservation of Nature and Natural Resources (IUCN) because of their restricted range and/or habitat disturbance. Donald W. Linzey Bibliography. D. Macdonald (ed.), The Encyclopedia of Mammals, Andromeda Oxford, 2001; R. M. Nowak, Walker’s Mammals of the World, 6th ed., Johns Hopkins University Press, 1999.
Jet flow A fluid flow in which a stream of one fluid mixes with a surrounding medium, at rest or in motion. Such flows occur in a wide variety of situations, and the geometries, sizes, and flow conditions cover a large range. To create some order, two dimensionless groupings are used. The first is the Reynolds number, defined in Eq. (1), where ρ is the density, V Re ≡
ρV L µ
(1)
is a characteristic velocity (for example, the jet exit velocity), L is a characteristic length (for example, the jet diameter), and µ is the viscosity. The second is the Mach number, defined in Eq. (2), where a is M≡
V a
(2)
the speed of sound. Jet flows vary greatly, depending on the values of these two numbers. See MACH NUMBER; REYNOLDS NUMBER. Laminar and turbulent flows. Jets exiting into a similar fluid (liquid into liquid, or gas into gas) almost always become turbulent within a short distance of their origin, irrespective of whether the flow in the supply line is laminar or turbulent. For example, a water jet exiting into water at rest with Re ≈ 2300 (Fig. 1) is initially in the simple laminar state, but at
Jet (gemology) A black, opaque material that takes a high polish. Jet has been used for many centuries for ornamental purposes. It is a compact variety of lignite coal. It has a refractive index of 1.66 (which accounts for its relatively high luster), a hardness of 3–4 on Mohs scale, and a specific gravity of 1.30–1.35. Jet is compact and durable, and can be carved or even turned on a lathe. The principal source is Whitby, England, where it occurs in hard shales. Although popular from Pliny’s day until the nineteenth century, the use of jet for jewelry purposes declined markedly for centuries until a resurgence of interest occurred in the 1960s. See GEM; LIGNITE. Richard T. Liddicoat, Jr.
Fig. 1. Laser-induced fluorescence of a round dyed water jet directed downward into water at rest, showing the instantaneous concentration of injected fluid in the plane of the jet axis. The Reynolds number is approximately 2300. (From P. E. Dimotakis, R. C. Lye, and D. Z. Papantoniou, Proceedings of the 15th International Symposium on Fluid Dynamics, Jachranka, Poland, 1981)
Jet stream this Reynolds number that state is unstable and the flow undergoes a transition to the more chaotic turbulent state. Turbulent structures called eddies are formed with a large range of sizes. The large-scale structures are responsible for capturing fluid from the surroundings and entraining it into the jet. However, the jet and external fluids are not thoroughly mixed until diffusion is completed by the small-scale structures. A more stable case, which may remain laminar for a long distance, is where a liquid jet issues into air or other light gas. See DIFFUSION; LAMINAR FLOW; TURBULENT FLOW. High-speed flows. When the velocities in the jet are greater than the speed of sound (M > 1) the flow is said to be supersonic, and important qualitative changes in the flow occur. The most prominent change is the occurrence of shock waves. For example, a supersonic air jet exhausting from a nozzle at low pressure into higher-pressure air at rest is said to be overexpanded. Such a flow can be visualized by an optical technique called schlieren photography (Fig. 2). As the jet leaves the nozzle, it senses the higher pressure around it and adjusts through oblique shock waves emanating from the edges of the nozzle. As the waves approach the jet axis, they are focused and strengthen to form a vertical Mach disk, visible in the middle of the jet. The pattern repeats itself until dissipated by viscous effects. If the supersonic jet exits the nozzle at higher pressure than the surroundings, it will plume out. Such behavior is observed as the space shuttle rises to high altitudes with the main engines running. See SCHLIEREN PHOTOGRAPHY; SHOCK WAVE; SUPERSONIC FLOW. Plumes. Another class of jet flows is identified by the fact that the motion of the jet is induced primarily by buoyancy forces. A common example is a hot gas exhaust rising in the atmosphere. Such jet flows are called buoyant plumes, or simply plumes, as distinct from the momentum jets, or simply jets, discussed above. These flows become very interesting when the surrounding fluid is stratified in different density layers, as can happen in the atmosphere and ocean by heating from the Sun. A heated plume may then rise rapidly until it reaches a layer of gas at the same density, where the buoyancy force due to
Fig. 2. Schlieren photograph of an overexpanded Mach 1.8 round air jet exiting from left to right into ambient air. The adjustment of the lower pressure in the surroundings leads to oblique shock waves and Mach disks in the jet. (From H. Oertel, Modern Developments in Shock Tube Research, Shock Tube Research Society, Japan, 1975)
temperature difference will drop to zero. The plume will then spread out in that layer without further rise. Such situations can be important for pollutiondispersal problems. See AIR POLLUTION; BOUNDARYLAYER FLOW; FLUID FLOW. Joseph A. Schetz Bibliography. G. N. Abramovich, Theory of Turbulent Jets, 1963; J. A. Schetz, Boundary Layer Analysis, 1993; J. A. Schetz, Injection and Mixing in Turbulent Flow, 1980.
Jet stream A relatively narrow, fast-moving wind current flanked by more slowly moving currents. Jet streams are observed principally in the zone of prevailing westerlies above the lower troposphere and in most cases reach maximum intensity, with regard both to speed and to concentration, near the tropopause. At a given time, the position and intensity of the jet stream may significantly influence aircraft operations because of the great speed of the wind at the jet core and the rapid spatial variation of wind speed in its vicinity. Lying in the zone of maximum temperature contrast between cold air masses to the north and warm air masses to the south, the position of the jet stream on a given day usually coincides in part with the regions of greatest storminess in the lower troposphere, though portions of the jet stream occur over regions which are entirely devoid of cloud. The jet stream is often called the polar jet, because of the importance of cold, polar air. The subtropical jet is not associated with surface temperature contrasts, like the polar jet. Maxima in wind speed within the jet stream are called jet streaks. See CLEAR-AIR TURBULENCE. Characteristics. The specific characteristics of the jet stream depend upon whether the reference is to a single instantaneous flow pattern or to an averaged circulation pattern, such as one averaged with respect to time, or averaged with respect both to time and to longitude. If the winter circulation pattern on the Northern Hemisphere is averaged with respect to both time and longitude, a westerly jet stream is found at an elevation of about 8 mi (13 km) near latitude (lat) 25◦. The speed of the averaged wind at the jet core is about 80 knots (148 km/h). In summer the jet is displaced poleward to a position near lat 41◦. It is found at an elevation of about 7 mi (12 km) with a maximum speed of about 30 knots (56 km/h). In both summer and winter a speed equal to one-half the peak value is found approximately 15◦ of latitude south, 20◦ of latitude north, 3–6 mi (5–10 km) above, and 3–6 mi (5–10 km) below the location of the jet core itself. If the winter circulation is averaged only with respect to time, it is found that both the intensity and the latitude of the westerly jet stream vary from one sector of the Northern Hemisphere to another. The most intense portion, with a maximum speed of about 100 knots (185 km/h), lies over the extreme western portion of the North Pacific Ocean
513
514
Jewel bearing at about lat 22◦. Lesser maxima of about 85 knots (157 km/h) are found at lat 35◦ over the east coast of North America, and at lat 21◦ over the eastern Sahara and over the Arabian Sea. In summer, maxima are found at lat 46◦ over the Great Lakes region, at lat 40◦ over the western Mediterranean Sea, and at lat 35◦ over the central North Pacific Ocean. Peak speeds in these regions range between 40 and 45 knots (74 and 83 km/h). The degree of concentration of these jet streams, as measured by the distance from the core to the position at which the speed is one-half the core speed, is only slightly greater than the degree of concentration of the jet stream averaged with respect to time and longitude. At both seasons at all longitudes the elevation of these jet streams varies between 6.5 and 8.5 mi (11 and 14 km). Variations. On individual days there is a considerable latitudinal variability of the jet stream, particularly in the western North American and western European sectors. It is principally for this reason that the time-averaged jet stream is not well defined in these regions. There is also a great day-to-day variability in the intensity of the jet stream throughout the hemisphere. On a given winter day, speeds in the jet core may exceed 200 knots (370 km/h) for a distance of several hundred miles along the direction of the wind. Lateral wind shears in the direction normal to the jet stream frequently attain values as high as 100 knots per 300 nautical miles (185 km/h per 556 km) to the right of the direction of the jet stream current and as high as 100 knots per 100 nautical miles (185 km/h per 185 km) to the left. Vertical shears below and above the jet core as often as large as 20 knots per 1000 ft (37 km/h per 305 m). Daily jet streams are predominantly westerly, but northerly, southerly, and even easterly jet streams may occur in middle or high latitudes when ridges and troughs in the normal westerly current are particularly pronounced or when unusually intense cyclones and anticyclones occur at upper levels. Insufficiency of data on the Southern Hemisphere precludes a detailed description of the jet stream, but it appears that the major characteristics resemble quite closely those of the jet stream on the Northern Hemisphere. The day-to-day variability of the jet stream, however, appears to be less on the Southern Hemisphere. It appears that an intense jet stream occurs at high latitudes on both hemispheres in the winter stratosphere at elevations above 12 mi (20 km). The data available, however, are insufficient to permit the precise location or detailed description of this phenomenon. See AIR MASS; ATMOSPHERE; ATMOSPHERIC WAVES, UPPER SYNOPTIC; GEOSTROPHIC WIND; STORM; VORTEX . Frederick Sanders; Howard B. Bluestein
known as ruby or sapphire. The extensive use of such bearings in the design of precision devices is mainly due to the outstanding qualities of the material. Sapphire’s extreme hardness imparts to the bearing excellent wear resistance, as well as the ability to withstand heavy loads without deformation of shape or structure. The crystalline nature of sapphire lends itself to very fine polishing and this, combined with the excellent oil- and lubricant-retention ability of the surface, adds to the natural low-friction characteristics of the material. Sapphire is also nonmagnetic and oxidization-resistant, and has a very high melting point (3685◦F or 2029◦C). Ruby has the same properties as sapphire; the red coloration is due to the introduction of a small amount of chromium oxide. See ANTIFRICTION BEARING; GEM; GYROSCOPE; RUBY; SAPPHIRE; WATCH. Types. Jewel bearings, classified as either instrument or watch jewels, are also categorized according to their configuration or function. The ring jewel is the most common type. It is basically a journal bearing which supports a cylindrical pivot. The wall of the hole can be either left straight (bar hole) or can be imparted a slight curvature from end to end (olive hole). This last configuration is designed to reduce friction, compensate for slight misalignment, and help lubrication. A large variety of instrument and timing devices are fitted with such bearings, including missile and aircraft guidance systems. See GUIDANCE SYSTEMS. Vee, or V, jewels are used in conjunction with a conical pivot, the bearing surface being a small radius located at the apex of a conical recess. This type of bearing is found primarily in electric measuring instruments. Cup jewels have a highly polished concave recess mated to a rounded pivot or a steel ball. Typical are compass and electric-meter bearings. End stone and cap jewels, combined with ring jewels, control the end play of the pivot and support axial thrust. They consist of a disk with highly polished flat or convex ends. Other relatively common jewel bearings are pallet stones and roller pins; both are part of the time- keeping device’s escapement.
Jewel bearing A bearing used in quality timekeeping devices, gyros, and instruments; usually made of synthetic corundum (crystallized Al2O3), which is more commonly
Fig. 1. Automatic cupping machines for the manufacture of jewel bearings. (Bulova Watch Co.)
Johne’s disease Dimensions. Minute dimensions are a characteristic of jewel bearings. A typical watch jewel may be 0.040 in. (0.10 cm) in diameter with a 0.004-in. (0.01-cm) hole, but these dimensions may go down to 0.015 and 0.002 in. (0.038 and 0.005 cm), respectively. Jewels with a diameter of more than 1/√16 in. (0.16 cm) are considered large. It is usual for critical dimensions, such as hole diameter and roundness, to have a tolerance of 0.0001 in. (0.00025 cm) or less. In some instances these tolerances may be as low as 0.0000020 in. (0.000005 cm). Manufacturing. Because of its hardness sapphire can only be worked by diamond, which is consequently the main tool for the production of jewel bearings. Both natural and synthetic diamond are used, mostly under the form of slurry, broaches, and grinding wheels. See DIAMOND. The machining of the blanks, small disks, or cylinders of varied diameter and thickness is the first step in the manufacturing process, and is common to most types of jewel bearings. The boules (pearshaped crystals of synthetic ruby or sapphire) are first oriented according to the optical axis to ensure maximum hardness of the bearing working surface. They are then sliced, diced, and ground flat, and rounded by centerless grinding to the required blank dimensions. From this point on, the process varies considerably, according to the type of bearing. Ring jewels are drilled with a steel or tungsten wire and coarse diamond slurry, or bored with a small grinding tool. The hole is then enlarged and sized by reciprocating wires of increasingly larger sizes through a string of jewels, until the required hole size is achieved. Fine diamond powder and a very slow rate of material removal permit the respect of strict tolerances and high-finish quality requirements. The jewels, supported by a wire strung through the hole, are then ground in a special centerless-type grinding machine to the desired outside diameter dimension. After the cutting of a concave recess, which functions as an oil reservoir, the top and bottom of the bearing are polished and beveled by lapping and brushing. Finally, the “olive” configuration is ob-
Fig. 2. Individual head of an automatic cupping machine. (Bulova Watch Co.)
Fig. 3. Technician shown operating an automatic cupping machine. (Bulova Watch Co.)
tained by oscillating a diamond charged wire through the hole of the rotating jewel. Between each operation the boiling of the jewels in a bath of sulfuric and nitric acid disposes of remaining slurries and other contaminating products. The conical recess in vee jewels is first roughly shaped by a pyramidal diamond tool. The wall of the vee and the radius are then polished and blended with an agglomerated diamond broach, and a succession of brushing operations. Lapping of the top of the jewel and brushing a bevel around the upper outside edge conclude the process. Most other types of jewel bearings, such as end stones and pallet stones, are shaped by a series of grinding, lapping, and brushing operations. A full line of automatic and semiautomatic highprecision equipment has been developed to handle and machine jewel bearings efficiently, permitting mass production and relatively low cost (Figs. 1–3). Traditionally, a large proportion of the labor involved is devoted to in-process and final quality control. Robert M. Schultz
Johne’s disease A slowly progressive diarrheal disease that causes major economic loss to the cattle industry; also known as paratuberculosis. It is caused by Mycobacterium avium ss paratuberculosis (MAP), which produces chronic inflammation of the mucosa of the ileocecal valve and adjacent tissues of the gastrointestinal tract of cattle, sheep, goats, and wild ruminants. The organism has been isolated from 2–3% of the adult cows in the United States; in some exotic species (such as mouflon and mountain goats), more than 90% of the animals in a herd may be infected. Animals less than 2 months of age are most susceptible to infection. Therefore, eliminating or minimizing exposure of young animals to MAP is important in the control of Johne’s disease. It is recommended that calves be removed from dams immediately after birth and fed colostrum from negative animals. See MYCOBACTERIAL DISEASES. Transmission. Transmission of MAP is primarily by ingestion of feces from animals shedding the organism. The incubation period varies from 1 to 3 years
515
516
Joint (anatomy) or more. Diseased animals in early or subclinical stages have intermittent or persistent diarrhea without fever and in advanced stages become emaciated. Congenital infections have been reported, and the organism has been isolated from colostrum, supramammary lymph nodes, fetal tissues, blood, and semen of cattle with clinical disease. Diagnosis. Johne’s disease is diagnosed by mycobacteriologic examinations conducted on feces or tissues collected by biopsy or at necropsy. Enzymelinked immunosorbent assays (ELISA) have been developed using a purified protein derivative prepared from the culture filtrate of MAP or chemical extracts of the bacterium, including proteins and carbohydrates (antigens) for detecting antibodies in sera of animals exposed to the organism. ELISA reportedly fails to identify three of four contagious cows shedding MAP in feces. Moreover, animals in the early stages of disease are frequently negative for humoral antibodies on ELISA, complement fixation, and/or agar gel immunodiffusion tests. Therefore, to confirm a diagnosis of Johne’s disease, it is necessary to isolate and identify the etiologic agent by mycobactin dependency and/or polymerase chain reaction (PCR). Vaccination. A killed whole-cell vaccine available for use in calves 1–35 days of age markedly reduces and often eliminates the occurrence of clinical disease in adult cattle. Live attenuated strains of MAP have been used for vaccinating cattle in a few countries, but are not approved for use in the United States or Canada. Therapeutic drugs are not available for routine treatment of animals. Relation to Crohn’s disease. The superficial similarity of Johne’s disease to Crohn’s disease in humans has led to the hypothesis that Crohn’s disease is also caused by MAP. Although MAP has been isolated from some Crohn’s disease patients, no definitive information on association is available. Moreover, published information on Crohn’s disease in humans fails to show any association with bovine paratuberculosis. Available information indicates that the parasitic worm Trichuris suis may be useful in the treatment of patients with Crohn’s disease. This helminth may play a role in altering host mucosal immunity by inhibiting dysregulated inflammatory responses. See INFLAMMATORY BOWEL DISEASE. Charles O. Thoen Bibliography. O. Chacon, L. E. Bermudez, and R. G. Baretta, Johne’s disease, inflammatory bowel disease and Mycobacterium paratuberculosis, Annu. Rev. Microbiol., 58:329–363, 2004; P. H. Jones et al., Crohn’s disease in people exposed to clinical cases of bovine paratuberculosis, Epidemiol. Infect., pp. 1–8, January 2005; R. W. Summers et al., Trichuris suis therapy in Crohn’s disease, Gut, 54:87–89, 2005; C. O. Thoen and R. G. Barletta, Mycobacterium, in Pathogenesis of Bacterial Infections in Animals, pp. 70–77, Blackwell, 2004; C. O. Thoen and J. A. Haagsma, Molecular techniques in the diagnosis and control of paratuberculosis in cattle, J. Amer. Vet. Med. Ass., 209:734–737, 1996; J. V. Weinstock, R. W.
Summers, and D. E. Elliott, Role of helminths in regulating mucosal inflammation, Springer Seminars Immunopathol., 27:249–271, 2005.
Joint (anatomy) The structural component of an animal skeleton where two or more skeletal elements meet, including the supporting structures within and surrounding it. The relative range of motion between the skeletal elements of a joint depends on the type of material between these elements, the shapes of the contacting surfaces, and the configuration of the supporting structures. In bony skeletal systems, there are three general classes of joints: synarthroses, amphiarthroses, and diarthroses. Synarthroses are joints where bony surfaces are directly connected with fibrous tissue, allowing very little if any motion. Synarthroses may be further classified as sutures, syndesmoses, and gomphoses. Sutures are joined with fibrous tissue, as in the coronal suture where the parietal and frontal bones of the human skull meet; in adulthood, this suture becomes an essentially rigid bony union called a synostosis. Syndesmoses are connected with ligaments, as are the shafts of the tibia and fibula. The roots of a tooth that are anchored in the jaw bone with fibrous tissue form a gomphosis. Amphiarthroses are joints where bones are directly connected with fibrocartilage or hyaline cartilage and allow only limited motion. An amphiarthrosis joined with fibrocartilage, as found between the two pubic bones of the pelvis, is known as a symphysis; but when hyaline cartilage joins the bones, a synchondrosis is formed, an example being the first sternocostal joint. The greatest range of motion is found in diarthrodial joints, where the articulating surfaces slide and to varying degrees roll against each other. See LIGAMENT. Diarthrodial joints. The contacting surfaces of the bones of a diarthrodial joint are covered with articular cartilage, an avascular, highly durable hydrated soft tissue that provides shock absorption and lubrication functions to the joint (Fig. 1). Articular cartilage is composed mainly of water, proteoglycans, and collagen. The joint is surrounded by a fibrous joint capsule lined with synovium, which produces lubricating synovial fluid and nutrients required by the tissues within the joint. Joint motion is provided by the muscles that are attached to the bone with tendons. Strong flexible ligaments connected across the bones stabilize the joint and may constrain its motion. Different ranges of motion result from several basic types of diarthrodial joints: pivot, gliding, hinge, saddle, condyloid, and ball-and-socket. See COLLAGEN. Pivot. Pivot (trochoid) joints allow rotation of one bone about another, as does the median atlantoaxial joint (atlas about the second cervical vertebra) at the base of the skull.
Joint (anatomy)
quadriceps tendon
Femur
load
articular cartilage
fibrous capsule articular cartilage
patella
motion meniscus
motion
patellar ligament
load
Tibia Fig. 1. Cross section of the human knee showing its major components. This diarthrodial joint contains contacting surfaces on the tibia, femur, meniscus, and patella (knee cap). The patella protects the joint and also serves to redirect the force exerted by the quadriceps muscles to the tibia. (After R. Skalak and S. Chien, eds., Handbook of Bioengineering, McGraw-Hill, 1987)
Gliding. Some of the joints found in the wrist and ankle are examples of gliding (arthrodia) joints. The surfaces of these joints slide against one another and provide limited motion in multiple directions. Hinge. Although the hinge (ginglymus) joint generally provides motion in only one plane (flexion and extension), its motion, as in the human knee, can be complex. The articulating surfaces of the knee both roll and slide past each other, and the motion of the tibia relative to the femur displays a small amount of rotation. The knee also contains two fibrocartilaginous (primarily collagenous) crescent-shaped discs between the surfaces of the tibia and femur, called menisci, which provide shock absorption, stability, and lubrication functions to the knee (Fig. 1). The elbow is another example of a hinge joint. Saddle. This joint is found at the base of the human thumb between the carpal and metacarpal (trapezium). It is characterized by saddle-shaped (concave and convex) articulating surfaces. The saddle joint provides a large range of motion (flexion, extension, abduction and adduction) with limited rotation. Condyloid. The range of motion of the condyloid joint is similar to that of the saddle joint. Here, the articulating surfaces are ovoid, and one surface is convex, the other concave. This joint is found at the base of the fingers where the metacarpal and proximal phalanx meet. The surfaces of the knee joint are also of condyloid shapes (Fig. 2). Ball-and-socket. The human shoulder and hip are examples of the ball-and-socket (enarthrosis) joint. In addition to flexion, extension, abduction, and adduction, they provide rotation about the long axis of the bone. The low relative-surface-area ratio of the glenoid (socket) to humeral head (ball) of the shoul-
der gives it the largest range of motion of all human joints. This high degree of mobility is stabilized by a complex rotator cuff consisting of ligaments and muscular attachments. The larger ratio of acetabulum (socket) to femoral head (ball) surface area of the hip constrains the joint to lesser mobility but provides it with greater inherent stability. Lubrication. The lubrication mechanisms of diarthrodial joints are extremely effective. Although the surface roughness of articular cartilage is higher than that of some engineered bearing surfaces, the coefficient of friction between cartilage surfaces is significantly lower than between most manufactured materials. The rate of wear of the articular surfaces is also extremely low in the normally functioning joint. While many details of diarthrodial joint lubrication mechanisms are not fully understood, these mechanisms fall into two categories: fluid-film lubrication, where the joint surfaces are separated by a layer of lubricating fluid (synovial fluid); and boundary lubrication, where the joint surfaces are in contact with each other. These mechanisms probably occur together to varying extents, depending upon the particular joint and the type of motion. Some of the major theories involving fluid-film lubrication are hydrodynamic, squeeze film, hydrostatic, and boosted lubrication.
load
synovial fluid
load and motion
articular cartilage motion (a)
(b) load and motion
(c)
load and motion
(d) absorbed boundary lubricant
pressurized fluid
cartilage (e)
boundary layer contact
Fig. 2. Possible modes of lubrication in diarthrodial joints. (a–d) Different types of fluid-film lubrication showing the directions of loading, motion, and fluid flow: (a) hydrodynamic; (b) squeeze film; (c) hydrostatic weeping; (d) boosted. (e) A combination of boundary layer lubrication at points of contact and fluid-film lubrication. (After V. C. Mow and W. C. Hayes, eds., Basic Orthopaedic Biomechanics, Raven Press, 1991)
517
518
Joint (structures) Hydrodynamic. If the relative velocity between the joint surfaces is sufficiently high, a pressurized, wedge-shaped layer of the intervening synovial fluid may be formed (Fig. 2a). This fluid layer generates lift which forces the joint surfaces apart. Hydrodynamic lubrication probably does not occur to any great extent in diarthrodial joints, except under conditions of low load and high velocity that might occur, for example, during high-speed pitching of a ball. Squeeze film. When both surfaces of a joint approach each other, pressure in the synovial fluid is generated because this viscous lubricant cannot instantaneously be squeezed from the gap (Fig. 2b). The resulting pressurized squeeze film can support large loads which may be generated, for example, in the knee joint when jumping from a height. Since the articular cartilage of the joint surfaces is deformable, the synovial fluid may become trapped in localized depressions. This film may support large loads for many minutes before becoming depleted. Hydrostatic. During normal joint loading, compression of the articular cartilage surfaces may result in the weeping of fluid from the cartilage into the joint cavity (Fig. 2c). Weeping lubrication, a form of hydrostatic lubrication, is facilitated by this selfpressurization of the fluid within the cartilage. Boosted. As the surfaces of the joint approach each other during normal function, it is possible that the resulting pressure causes some of the water in the synovial fluid to be forced into the cartilage (Fig. 2d). Boosted lubrication results from the remaining, highly viscous synovial fluid concentrate between the two articulating surfaces. Boundary. Boundary layer lubrication takes place when the surfaces of the joints are touching each other. It has been proposed that a monolayer of a polypeptide chain, called lubricin, is adsorbed onto the cartilage surface. This boundary lubricant separates the surfaces during normal joint function and provides the ability to carry weight and reduce friction. Fluid-film lubrication probably occurs simultaneously in the gaps between the joint surfaces that result from undulations in the articular cartilage surfaces (Fig. 2e). The types of lubrication mechanisms active in a joint depend on the contact pattern of the surfaces, the applied load, and the relative velocity of the surfaces. The contact patterns between joint surfaces vary with different joints and also vary within a joint; depending on the relative positioning of the joint surfaces during a particular motion. The applied load and relative velocity of the surfaces depend on the particular action of the joint. Thus, different lubrication modes may occur within the same joint for different motions or even for the same motion, depending on the applied load and the speed with which the motion is accomplished. See SKELETAL SYSTEM. Van C. Mow; Robert J. Foster Bibliography. V. C. Mow and W. C. Hayes (eds.), Basic Orthopaedic Biomechanics, 2d ed., 1997; V. C. Mow, A. Ratcliffe, and S. L. Y. Woo (eds.), Biomechanics of Diarthrodial Joints, vols. 1 and 2, 1990;
C. Norkin and P. K. Levangie, Joint Structure and Function, 3d ed., 2000.
Joint (structures) The surface at which two or more mechanical or structural components are united. Whenever parts of a machine or structure are brought together and fastened into position, a joint is formed. See STRUCTURAL CONNECTIONS. Mechanical joints can be fabricated by a great variety of methods, but all can be classified into two general types, temporary (screw, snap, or clamp, for example), and permanent (brazed, welded, or riveted, for example). The following list includes many of the more common methods of forming joints. (1) Screw threads: bolt and nut, machine screw and nut, machine screw into tapped hole, threaded parts (rod, pipe), self-tapping screw, lockscrew, studs with nuts, threaded inserts, coiled-wire inserts, drive screws. (2) Rivets: solid, hollow, explosive and other blind side types. (3) Welding. (4) Soldering. (5) Brazing. (6) Adhesive. (7) Friction-held: nails, dowels, pins, clamps, clips, keys, shrink and press fits. (8) Interlocking: twisted tabs, snap ring, twisted wire, crimp. (9) Other: peening, staking, wiring, stapling, retaining rings, magnetic. Also, pipe joints are made with screw threads, couplings, caulking, and by welding or brazing; masonry joints are made with cement mortar. See ADHESIVE; ASPHALT AND ASPHALTITE; BOLT; BOLTED JOINT; BRAZING; MUCILAGE; NUT (ENGINEERING); RIVET; SCREW; SCREW FASTENER; SOLDERING; WASHER; WELDED JOINT; WELDING AND CUTTING OF MATERIALS. William H. Crouse
Joint disorders Local or generalized conditions that affect the joints and related tissues. About one person in 16 who is over 15 years old has some form of joint disturbance. The most common conditions are forms of arthritis, which cause the inflammation or degeneration of joint structures. These may follow specific infections, injury, generalized disease states, or degenerative changes which largely parallel the aging processes in the body. Rheumatoid arthritis, one of the most common diseases which involve joints, is of unknown etiology and occurs most often in young or middle-aged adults. Multiple joints are involved, especially those of the small bones in the hands, feet, and wrists or the jaw. Other connective tissue diseases, such as systemic lupus erythematosus and vasculitis, frequently are accompanied by joint symptomatology. Degenerative joint disease (osteoarthritis) is a ubiquitous joint disease prevalent in older people. This form of arthritis is characterized by deterioration of the articular surfaces of the weight-bearing joints of the knees, hips, and spine. Frequent involvement is seen at the distal interphalangeal joints
Jojoba (Heberden’s nodes). Arthritis produced by trauma may also result in degenerative joint disease. Infectious arthritis may be caused by gonorrhea, pneumococci, streptococci, and other pathogenic organisms such as tuberculosis and syphilis. Gout is an example of crystal-deposition arthritis caused by the presence of monosodium urate crystals in synovial fluid. There is an abnormality of purine metabolism that eventuates in the accumulation of urates and uric acid in joint tissues. The physical findings and clinical course are usually distinctive. See GOUT. Spondyloarthropathies constitute a group of arthritis disorders characterized by involvement of the axial (central) skeleton and association with the histocompatibility system. Miscellaneous forms of arthritis include those systemic diseases of unknown or quite different etiology which may produce arthritis or joint degeneration. Syringomyelia, syphilis (tabes dorsalis), and leprosy are examples in which destructive joint lesions may be secondary to nerve damage of the affected joints (Charcot joints). Hemophilia and sickle-cell anemia (blood disorders) and acromegaly, hypothyroidism, and hyperparathyroidism (endocrine disorders) are also examples of these unusual rheumatic diseases. See ARTHRITIS. Tumors of joint tissues are usually benign chondromas, fibromas, and lipomas, but malignant synovio sarcoma can occur. See TUMOR. Rheumatism is a nonspecific, predominantly lay term which includes local pain and tenderness. Most causes are chronic inflammations or mildly progressive degenerations, and many, when investigated, fall into one of the previously mentioned categories. See BURSITIS; RHEUMATISM. Robert Searles
Jojoba Simmondsia chinensis, the only plant known to produce and store a liquid wax in its seed. The jojoba plant is native to the southwestern United States and Mexico. It is tolerant of some of the highest temperatures and most arid regions, and is being domesticated as a crop for hot low-rainfall regions around the world. A broadleaf evergreen shrub that is typically 3–10 ft (1–3 m) in height, it can grow as tall as 20 ft (6 m). Whereas the seed-storage lipid of other oilseed plants such as soybean and sunflower is a branched ester based on the three-carbon glycerol molecule, that of jojoba is a straight-chain ester. A majority of the wax molecules of jojoba are formed from acids and alcohols with 20 or 22 carbon atoms and one double bond. Many modifications can be made at the double bond, which results in the plant’s versatility as an ingredient in a wide range of chemical products. Jojoba wax, used in cosmetics and lubricants, has the potential to serve as a basic feedstock if seed production costs are reduced. Jojoba is a relatively new commercial crop that is being developed simultaneously in many places around the world,
and cultivation methods are variable and change rapidly. See FAT AND OIL (FOOD); WAX, ANIMAL AND VEGETABLE. Site and variety selection. Jojoba is a perennial, and flower buds that are formed in the spring, summer, and fall of one year must overwinter before producing a crop in the following year. The buds can succumb at temperatures below −3.5◦C (25◦F), and so sites where such temperatures are expected are unsuitable for planting. Jojoba is sensitive to standing water, and so sites with well-drained soil away from potential flooding should be selected. Because widely tested cultivars of jojoba are unavailable, if jojoba is being introduced to an area, small test plantings should first be established and locally adapted varieties selected. Jojoba is dioecious, and both male and female plants are needed for seed production. Propagation. Jojoba is easy to grow from seed. Germination is best in the temperature range of 68–95◦F (20–35◦C). Seed-propagated plants are extremely variable in yield, size, sex, and other traits, and so use of seed is not recommended for management systems that require high yield or plant uniformity. Several asexual methods of propagation are effective for jojoba, including stem cuttage, micropropagation, layering, and grafting. These methods allow propagation of selected plants that have a high yield potential or other desired traits. Stem cuttage is the most widely used. Stem tips with four to six pairs of leaves are collected during the warmer months of the year, basally treated with an auxin, and then rooted by using standard mist propagation techniques. Rooting occurs in 4–8 weeks, and plants are ready for field planting in 4–6 months. See REPRODUCTION (PLANT). Irrigation. Although jojoba grows under natural rainfall conditions in several countries, including Paraguay and Australia, and has grown well in some plantings with natural rainfall in Arizona, most growers irrigate with a drip, furrow, or sprinkler system. Water requirements of jojoba for profitable cultivation are not well defined. In nature, jojoba grows where average annual rainfall is as low as 3 in. (76 mm), but growth is confined to washes, where rainfall is concentrated. The most vigorous plants in nature occur where average annual rainfall is 10–20 in. (250–500 mm). In commercial plantings, good yields have been obtained by using about 1.5 ft (460 mm) of water in the central valley of California and about 2 ft (600 mm) in south-central Arizona. Some growers, however, use 4 ft (1200 mm) or more. Restricting irrigation as winter approaches appears to improve frost resistance. See IRRIGATION (AGRICULTURE). Insects and diseases. Several pest and disease problems have been encountered for which control strategies must be developed. Damage has resulted from attacks by spider mites and several types of insects including stinkbugs, inchworms, thrips, grasshoppers, and lepidopterous larvae. In addition, relatively innocuous infestations with aphids, scales, and mealy bugs have been observed.
519
520
Josephson effect Several root and foliar fungal pathogens have been isolated from diseased jojoba plants growing in both field and nursery environments. To date, virus diseases have not been found in association with jojoba. See PLANT PATHOLOGY. Harvesting and processing. Although economic conditions in many countries permit hand harvesting, successful cultivation in the United States will probably rely on an efficient mechanical harvester. Growers have evaluated many types of mechanical harvesters, some of which shake the plants and catch the seed as it falls while others vacuum the seed from the ground after it has dropped. Harvesting requires several passes because of nonuniform maturity. Once collected, the seeds are cleaned by using standard technology developed for other oilseeds. The wax is then pressed or solvent-extracted from the seed. After filtering, a transparent, yellowish oil remains that may undergo an additional decolorization process. David A. Palzkill Bibliography. H. S. Gentry, The natural history of jojoba (Simmondsia chinensis) and its cultural aspects, Econ. Bot., 12:261–295, 1958; G. Robbelen, R. K. Downey, and D. A. Ashri (eds.), Oil Crops of the World, 1989; W. C. Sherbrooke and E. F. Haase, Jojoba: A Wax Producing Shrub of the Sonoran Desert, 1974; J. Wisniak, The Chemistry and Technology of Jojoba Oil, 1987.
Josephson effect The passage of paired electrons (Cooper pairs) through a weak connection (Josephson junction) between superconductors, as in the tunnel passage of paired electrons through a thin dielectric layer separating two superconductors. Nature of the effect. Quantum-mechanical tunneling of Cooper pairs through a thin insulating barrier (on the order of a few nanometers thick) between two superconductors was theoretically predicted by Brian D. Josephson in 1962. Josephson calculated the currents that could be expected to flow during such superconductive tunneling, and found that a current of paired electrons (supercurrent) would flow in addition to the usual current that results from the tunneling of single electrons (single or unpaired electrons are present in a superconductor along with bound pairs). Josephson specifically predicted that if the current did not exceed a limiting value (the critical current), there would be no voltage drop across the tunnel barrier. This zero-voltage current flow resulting from the tunneling of Cooper pairs is known as the dc Josephson effect. Josephson also predicted that if a constant nonzero voltage V were maintained across the tunnel barrier, an alternating supercurrent would flow through the barrier in addition to the direct current produced by the tunneling of unpaired electrons. The frequency ν of the alternating supercurrent is given by Eq. (1), where e is ν=
2eV h
(1)
the magnitude of the charge of an electron and h is Planck’s constant. The oscillating current of Cooper pairs that flows when a steady voltage is maintained across a tunnel barrier is known as the ac Josephson effect. Josephson further predicted that if an alternating voltage at frequency f were superimposed on the steady voltage applied across the barrier, the alternating supercurrent would be frequency-modulated and could have a dc component whenever ν was an integral multiple of f. Depending upon the amplitude and frequency of the ac voltage, the dc currentvoltage characteristic would display zero-resistance parts (constant-voltage steps) at voltages V given by Eq. (2), where n is any integer. Finally, Josephson V =
nh f 2e
(2)
predicted that effects similar to the above would also occur for two superconducting metals separated by a thin layer of nonsuperconducting (normal) metal. In 1963 the existence of the dc Josephson effect was experimentally confirmed by P. W. Anderson and J. M. Rowell, and the existence of the ac Josephson effect was experimentally confirmed by S. Shapiro. See NONRELATIVISTIC QUANTUM THEORY; TUNNELING IN SOLIDS. Theory of the effect. The superconducting state has been described as a manifestation of quantum mechanics on a macroscopic scale, and the Josephson effect is best explained in terms of phase, a basic concept in the mathematics of quantum mechanics and wave motion. For example, two sine waves of the same wavelength λ are said to have the same phase if their maxima coincide, and to have a phase difference equal to 2π /λ if their maxima are displaced by a distance δ. An appreciation of the importance that phase can have in physical systems can be gained by considering the radiation from excited atoms in a ruby rod. For a given transition, the atoms emit radiation of the same wavelength; if the atoms also emit the radiation in phase, the result is the ruby laser. According to the Bardeen-Cooper-Schrieffer (BCS) theory of superconductivity, an electron can be attracted by the deformation of the metal lattice produced by another electron, and thereby be indirectly attracted to the other electron. This indirect attraction tends to unite pairs of electrons having equal and opposite momentum and antiparallel spins into the bound pairs known as Cooper pairs. In the quantum-mechanical description of a superconductor, all Cooper pairs in the superconductor have the same wavelength and phase. It is this phase coherence that is responsible for the remarkable properties of the superconducting state. The common phase of the Cooper pairs in a superconductor is referred to simply as the phase of the superconductor. See COHERENCE; PHASE (PERIODIC PHENOMENA); QUANTUM MECHANICS. The phases of two isolated superconductors are totally unrelated, while two superconductors in perfect contact have the same phase. If the
10
40
5
20
current, mA
current, mA
Josephson effect
0
–5
–10 –4
0
–20
–2
(a)
0 voltage, mV
2
–40 –20
4
–10
(b)
0 voltage, mV
10
20
40
current 0
50 µ A
current, mA
20
n = 450
–20
–40 –20 (c)
–10
0 voltage, mV
10
20 (d)
25 µV
voltage
Fig. 1. DC current-voltage characteristics of lead-lead oxide-lead Josephson tunnel junction at 1.2 K. (a) Without microwave power. (b) Same characteristic with reduced scale. (c) 11-GHz microwave power applied. (d) Expanded portion of c; arrow indicates a constant-voltage step near 10.2 mV corresponding to n = 450 in Eq. (2). This voltage is also indicated by arrows in b and c. (After T. F. Finnegan, A. Denenstein, and D. N. Langenberg, AC-Josephson-effect determination of e/h: A standard of electrochemical potential based on macroscopic quantum phase coherence in superconductors, Phys. Rev. B, 4:1487–1522, 1971)
superconductors are weakly connected (as they are when separated by a sufficiently thin tunnel barrier), the phases can be different but not independent. If φ is the difference in phase between superconductors on opposite sides of a tunnel barrier, the results of Josephson’s calculation of the total current I through the junction can be written as Eq. (3), where I0 is the I = I0 + I1 sin φ
(3)
current due to single electron tunneling, and I1 sin φ is the current due to pair tunneling. The time dependence of the phase is given by Eq. (4). In general, the ∂φ 2eV = 2π ∂t h
(4)
currents I, I0, and I1 are all functions of the voltage across the junction. For V = 0, I0 is zero and φ is constant. The value of I1 depends on the properties of
the tunnel barrier, and the zero-voltage supercurrent is a sinusoidal function of the phase difference between the two superconductors. However, it is not the phase difference that is under the control of the experimenter, but the current through the junction, and the phase difference adjusts to accommodate the current. The maximum value sin φ can assume is 1, and so the zero-voltage value of I1 is the critical current of the junction. Integration of Eq. (4) shows the phase changes linearly in time for a constant voltage V maintained across the barrier, and the current through the barrier is given by Eq. (5), where φ 0 is a constant. The 2eV t + φ0 (5) I = I0 + I1 sin 2π h supercurrent is seen to be an alternating current with frequency 2eV/h. The supercurrent time-averages to
521
Josephson effect zero, and so the direct current through the barrier is just the single-electron tunneling current I0. If the voltage across the junction is V + v cos 2πft, Eqs. (3) and (4) give Eq. (6) for the current. The ex 2eV I = Io + I1 sin 2π t h 2eν sin 2πf t (6) + φ0 + hf
30
current, mA
522
20
10
pression for the supercurrent is a conventional expression in frequency-modulation theory and can be rewritten as expression (7), where Jn is an integer n=∞
0
2eν I1 +(−1) Jn hf n=−∞ 2eV × sin 2π t − 2πn f t + φ0 h n
(7)
order Bessel function of the first kind. This expression time-averages to zero except when V = nhf/2e, in which case the supercurrent has a dc component given by (−1)nI1Jn(2eν/hf) sin φ 0. As for the zerovoltage direct supercurrent, the phase difference φ 0 adjusts to accommodate changes in current at this value of V, and the dc current-voltage characteristic displays a constant-voltage step. The dc current-voltage characteristic of a Josephson tunnel junction with and without a microwavefrequency ac voltage is shown in Fig. 1. The straightening of the current-voltage characteristic in the presence of microwave power displayed in Fig. 1c is due to the phenomenon of photon-assisted tunneling, which is essentially identical to classical rectification for the junction and frequency in question. See BESSEL FUNCTIONS. Josephson pointed out that the magnitude of the maximum zero-voltage supercurrent would be reduced by a magnetic field. In fact, the magnetic field dependence of the magnitude of the critical current is one of the more striking features of the Josephson effect. Circulating supercurrents flow through the tunnel barrier to screen an applied magnetic field from the interior of the Josephson junction just as if the tunnel barrier itself were weakly superconducting. The screening effect produces a spatial variation of the transport current, and the critical current goes through a series of maxima and minima as the field is increased. Figure 2 shows the variation of the critical current with magnetic field for a tunnel junction whose length and width are small in comparison with the characteristic screening length of the junction (the Josephson penetration depth, λJ). The mathematical function which describes the magnetic field dependence of the critical current for this case is the same function as that which describes the diffraction pattern produced when light passes through a single narrow slit. See DIFFRACTION. Josephson junctions. The weak connections between superconductors through which the Josephson effects are realized are known as Josephson junctions. Historically, superconductor-insulatorsuperconductor tunnel junctions have been used to
200
400 600 magnetic field, µT
800
Fig. 2. Magnetic field dependence of the critical current of a Josephson tunnel junction. Data are for a tin-tin oxide-tin junction at 1.2 K, with the magnetic field in the plane of the barrier. (After D. N. Langenberg, D. J. Scalapino, and B. N. Taylor, The Josephson effects, Sci. Amer., 214(5):30–39, May 1966)
study the Josephson effect, primarily because these are physical situations for which detailed calculations can be made. However, the Josephson effect is not necessarily a tunneling phenomenon, and the Josephson effect is indeed observed in other types of junctions, such as the superconductor-normal metal-superconductor junction. A particularly useful Josephson junction, the point contact, is formed by bringing a sharply pointed superconductor into contact with a blunt superconductor. The critical current of a point contact can be adjusted by changing the pressure of the contact. The low capacitance of the device makes it well suited for high-frequency applications. Thin-film microbridges form another group of Josephson junctions. The simplest microbridge is a short narrow constriction (length and width on the order of a few micrometers or smaller) in a superconducting film known as the AndersonDayem bridge. If the microbridge region is also thinner than the rest of the superconducting film, the resulting variable-thickness microbridge has better performance in most device applications. If a narrow strip of superconducting film is overcoated along a few micrometers of its length with a normal metal, superconductivity is weakened beneath the normal metal, and the resulting microbridge is known as a proximity-effect or Notarys-Mercereau microbridge. Among the many other types of Josephson junctions are the superconductor-semiconductorsuperconductor and other artificial-barrier tunnel junctions, superconductor-oxide-normal metalsuperconductor junctions, and the so-called SLUG junction, which consists of a drop of lead-tin solder solidified around a niobium wire. Some different types of Josephson junctions are illustrated in Fig. 3. The dc current-voltage characteristics of different types of Josephson junctions may differ, but all show a zero-voltage supercurrent, and constantvoltage steps can be induced in the dc characteristics
Josephson effect
2e = 483, 593, 420 MHz/VNBS h
(8)
and was maintained at the National Institute of Standards and Technology (formerly the National Bureau of Standards) within a few parts in 108 by using the ac Josephson effect. By international agreement, a new practical reference standard for the volt, based on the ac Josephson effect, was adopted worldwide on January 1, 1990. This internationally adopted representation of the volt is given by Eq. (9), where KJ KJ−90 = 483, 597.9 GHz/V
substrate electrodes for current and voltage leads
superconducting film with ~1 nm oxide surface layer (a) superconducting wire with fine point ~1mm current and voltage leads
superconducting post
(b)
substrate ~2 µm × 2 µm × 100 nm thick ~2 mm
weak link 4
switch from zero-voltage state to voltage-supporting state
2
0
0.3
0.6 0.9 voltage, mV
tunnel junction 1.2
1.5
Fig. 4. DC current-voltage characteristics for a weak link and a tunnel junction. (After E. Burstein and S. Lundqvist, eds., Tunneling Phenomena in Solids, Plenum, 1969)
(9)
is called the Josephson constant and is theoretically
junction
6
current, mA
at voltages given by Eq. (2) when an ac voltage is applied. The dc characteristics of a microbridge and a tunnel junction are compared in Fig. 4. Applications. From July 1972 through December 1989, the United States legal volt, VNBS, was defined by Eq. (1) through the assigned value given by Eq. (8),
superconducting film
(c) Fig. 3. Some types of Josephson junctions. (a) Thin-film tunnel junction. (b) Point contact. (c) Thin-film weak link. (After E. Burstein and S. Lundqvist, eds., Tunneling Phenomena in Solids, Plenum, 1969)
equal to 2e/h. This developed as a natural consequence of extremely precise measurements of 2e/h via the Josephson effect, and the recognition that a Josephson junction is a precise frequency-to-voltage converter and that atomic frequency standards are inherently more stable than electrochemical voltage standards. See ELECTRICAL UNITS AND STANDARDS; FUNDAMENTAL CONSTANTS. The Josephson effect permits measurement of absolute temperature: a voltage drop across a resistor in parallel with a Josephson junction causes the junction to emit radiation at the frequency given by Eq. (1), but voltage fluctuations resulting from thermal noise produce frequency fluctuations that depend on absolute temperature. See LOWTEMPERATURE THERMOMETRY. Josephson junctions, and instruments incorporating Josephson junctions, are used in other applications for metrology at dc and microwave frequencies, frequency metrology, magnetometry, detection and amplification of electromagnetic signals, and other superconducting electronics such as highspeed analog-to-digital converters and computers. A Josephson junction, like a vacuum tube or a transistor, is capable of switching signals from one circuit to another; a Josephson tunnel junction is capable of switching states in as little as 6 picoseconds and is the fastest switch known. Josephson junction circuits are capable of storing information. Finally, because a Josephson junction is a superconducting device, its power dissipation is extremely small, so that Josephson junction circuits can be packed together as tightly as fabrication techniques will permit. All the basic circuit elements required for a Josephson junction computer have been developed. See SUPERCONDUCTING DEVICES; SUPERCONDUCTIVITY. Louis B. Holdeman Bibliography. A. Barone and G. Paterno, Physics and Applications of the Josephson Effect, 1982; A. De Chiara and M. Russo, Tunneling Phenomena in High and Low Tc Superconductors, 1993.
523
524
Joule’s law
Joule’s law A quantitative relationship between the quantity of heat produced in a conductor and an electric current flowing through it. As experimentally determined and announced by J. P. Joule, the law states that when a current of voltaic electricity is propagated along a metallic conductor, the heat evolved in a given time is proportional to the resistance of the conductor multiplied by the square of the electric intensity. Today the law would be stated as H = RI2, where H is rate of evolution of heat in watts, the unit of heat being the joule; R is resistance in ohms; and I is current in amperes. This statement is more general than the one sometimes given that specifies that R be independent of I. Also, it is now known that the application of the law is not limited to metallic conductors. Although Joule’s discovery of the law was based on experimental work, it can be deduced rather easily for the special case of steady conditions of current and temperature. As a current flows through a conductor, one would expect the observed heat output to be accompanied by a loss in potential energy of the moving charges that constitute the current. This loss would result in a descending potential gradient along the conductor in the direction of the current flow, as usually defined. If E is the total potential drop, this loss, by definition, is equal to E in joules for every coulomb of charge that traverses the conductor. The loss conceivably might appear as heat, as a change in the internal energy of the conductor, as work done on the environment, or as some combination of these. The second is ruled out, however, because the temperature is constant and no physical or chemical change in a conductor as a result of current flow has ever been detected. The third is ruled out by hypothesis, leaving only the generation of heat. Therefore, H = EI in joules per second, or watts. By definition, R = E/I, a ratio which has positive varying values. Elimination of E between these two equations gives the equation below, which is Joule’s law as stated H = RI 2 above. If I changes to a new steady value I, R to R, and H and H = RI2 as before. The simplest case occurs where R is independent of I. If the current is varying, the resulting variations in temperature and internal energy undoubtedly exist and, strictly speaking, should be allowed for in the theory. Yet, in all but the most exceptional cases, any correction would be negligible. This phenomenon is irreversible in the sense that a reversal of the current will not reverse the outflow of heat, a feature of paramount importance in many problems in physics and engineering. Thus the heat evolved by an alternating current is found by taking the time average of both sides of the equation. Incidentally, the changes in internal energy, if they were included in the theory, would
average out. Hence the equation continues to have a similar form, H = RI 2, for ac applications. See ELECTRIC HEATING; OHM’S LAW. Lewellyn G. Hoxton; John W. Stewart
Juglandales An order of flowering plants, division Magnoliophyta (Angiospermae), in the subclass Hamamelidae of the class Magnoliopsida (dicotyledons). The order consists of two families: the Juglandaceae with a little over 50 species and the Rhoipteleaceae with only one species. Within its subclass the order is sharply set off by its compound leaves. Juglans (walnut and butternut) and Carya (hickory, including the pecan, C. illinoensis) are familiar genera of the Juglandaceae. See HAMAMELIDAE; HICKORY; MAGNOLIOPHYTA; MAGNOLIOPSIDA; PLANT KINGDOM. Arthur Cronquist; T. M. Barkley
Juncales An order of flowering plants, division Magnoliophyta (Angiospermae), in the subclass Commelinidae of the class Liliopsida (monocotyledons). The order consists of the family Juncaceae, with about 300 species, and the family Thurniaceae, with only three. Within its subclass the order is marked by its reduced, mostly wind-pollinated flowers and capsular fruits with one to many anatropous ovules per carpel. The flowers have six sepals arranged in two more or less similar whorls, both sets chaffy and usually brown or green. The ovary is tricarpellate, with axile or parietal placentation. The pollen grains are borne in tetrads, and the embryo is surrounded by endosperm. The order is most unusual among higher plants in that, together with at least some members of the Cyperaceae in the related order Cyperales, it has chromosomes with diffuse centromeres. See COMMELINIDAE; CYPERALES; LILIOPSIDA; MAGNOLIOPHYTA; PLANT KINGDOM. Arthur Cronquist
Junction detector A device in which detection of radiation takes place in or near the depletion region of a reverse-biased semiconductor junction. The electrical output pulse is linearly proportional to the energy deposited in the junction depletion layer by the incident ionizing radiation. See CRYSTAL COUNTER; IONIZATION CHAMBER. Introduced into nuclear studies in 1958, the junction detector, or more generally, the nuclear semiconductor detector, revolutionized the field. In the detection of both charged particles and gamma radiation, these devices typically improved experimentally attainable energy resolutions by about two
Junction detector thin n -type doped layer
thin gold layer + − +䊞 + electron − bias motion power − + supply 䊝 − − 䊝
n -type-base single-crystal silicon
(a)
䊞 + 䊞 + hole motion 䊝 − 䊝 − −
ohmic contact
p -type layer formed by surface treatment depletion region
+ − bias power supply
− 䊝 − 䊝 − 䊝− 䊝 electron 䊞 + 䊞 motion +䊞 + + hole motion + 䊞 + +
107 Ω
JFET
− 䊞 䊞 +
ohmic contact
chargesensitive amplifier
107 Ω
(b)
Key: 䊞 p-type dopant ion 䊝 n-type dopant ion
− electron + hole
Fig. 1. Silicon junction detectors. (a) Surface barrier detector. (b) A pn junction detector. The p-type dopant ions are fixed in the crystal lattice. JFET = junction field-effect transistor.
orders of magnitude over that previously attainable. To this they added unprecedented flexibility of utilization, speed of response, miniaturization, freedom from deleterious effects of extraneous electromagnetic (and often nuclear) radiation fields, lowvoltage requirements, and effectively perfect linearity of output response. They are now used for a wide variety of diverse applications, from logging drill holes for uranium to examining the Shroud of Turin. They are used for general analytical applications, giving both qualitative and quantitative analysis in the microprobe and the scanning transmission electron microscopes. They are used in medicine, biology, environmental studies, and the space program. In the last category they play a very fundamental role, ranging from studies of the radiation fields in the solar system to the composition of extraterrestrial surfaces. See ELECTRON MICROSCOPE; SECONDARY ION MASS SPECTROMETRY (SIMS). Fabrication of diodes. The first practical detectors were prepared by evaporating a very thin gold layer on a polished and etched wafer of n-type germanium (Ge). To reduce noise these devices were operated at liquid nitrogen temperature (77 K). Silicon (Si), however, with its larger band gap, 1.107 eV compared to 0.67 eV for germanium, offered the possibility of room-temperature operation. Gold-silicon surface barrier detectors and silicon pn junction detectors were soon developed. Surface barrier detectors are made from wafers of n-type silicon semiconductor crystals. The etching and surface treatments create a thin p layer, and the gold contacts this layer (Fig. 1a). The pn junction silicon detectors are usually made by diffusing phosphorus about 2 micrometers into the surface of a p-type silicon base (Fig. 1b). Both techniques give a pn junction. When this junction is reverse-biased, a depletion region, or a region devoid of carriers (electrons and holes), forms mainly in the higher-resistivity base
depletion region
p -type-base singlecrystal silicon output
output ohmic contact
525
material. A high field now exists in this region, and any carriers born or generated in it are rapidly swept from the region. The requirement for detection is that the ionizing radiation must lose its energy by creating electron-hole pairs (2.96 eV/pair in germanium and 3.66 eV/pair in silicon) in the depletion region or within a carrier diffusion length of this region. Both carriers have to be collected to give an output pulse proportional to the energy of the incident particle. Electrons and holes have similar mobilities in both silicon and germanium, and although carrier trapping occurs it is not as severe as in the 2–16 compounds.
lithium doped n -type region
+ drifting − bias
− − + −− + −+ + −+ − + + − + + + + + − − movement of − + + − lithium ions − + + − − + − + − − +
p -type base material Key: −
boron ions
+
lithium ions
− electrons + holes
+ −
lithium ion compensating a boron ion
Fig. 2. Compensation of p-type semiconductor material ◦ ◦ with lithium (at 212–392 F or 100–200 C for silicon, ◦ ◦ 104–140 F or 40–60 C for germanium). Boron ions are fixed in the lattice. Lithium ions are fixed in the lattice, but at elevated temperature can be drifted under an electric field and will compensate boron ions to widen the depletion region.
JFET
chargesensitive amplifier
526
Junction detector +5 V
radiation current pulse
feedback capacitor
reset current pulse
output detector
−2.5 V JFET
− +
1000 V
diode reverse bias increased for reset
(a)
+5 V light-induced reset current pulse
feedback capacitor
radiation current pulse
output
detector JFET − +
light-emitting diode pulsed to reset
1000 V
(b)
voltage step due to radiation
reset
amplifier ouput pulse gated off during reset
differentiated output pulse (c)
to the voltage that can be applied to a junction. Thus detectors for higher-energy or lower-mass particles (electrons) requiring wider depletion regions are made from high-resistivity material. This material occurs, by accident, during the growth of some crystals. Lithium-drifted silicon detectors. Still wider depletionwidth detectors can be made from lithium-drifted silicon. Lithium (Li) is a donor in silicon. In addition, at elevated temperatures (392◦F or 200◦C), the lithium ion is itself mobile. Thus when lithium is diffused in p-type silicon, a pn junction results. Reversebiasing this junction at elevated temperatures causes the lithium ion, now appearing as a positive charge, to migrate toward the negative side. On the way it encounters an acceptor ion, negatively charged, which is fixed in the crystal lattice. The lithium ion and the acceptor ion compensate each other, and the lithium ion remains in this location. As more lithium ions drift into and across the depletion region, they compensate the acceptor ions and the region widens (Fig. 2). Depletion regions, or compensated regions, up to 0.8 in. (2 cm) wide have been achieved with this technique. Lithium-drifted silicon detectors can be operated at room temperature, but the larger volume gives a greater thermally generated leakage current, which degrades the resolution. The best energy resolution is obtained by operating the detectors at low temperature. However, they may be stored at room temperature. Lithium-drifted silicon detectors are widely used to detect particle- or photon-induced x-rays. The resolution, when operated at 77 K, is sufficient to resolve the K x-rays for all elements higher in atomic number Z than carbon (Z = 6). A resolution of 100 eV has been obtained at 2 keV. At the lower x-ray energies the effects of the detector window thickness and the absorption in the window of the mounting are important, and silicon is preferred for these applications. For x-rays the efficiency of a 5-mm depletionwidth lithium-drifted silicon detector is about 50% at 30 keV and 5% at 60 keV. Typically these detectors have capacitances of about 2 picofarads and, to minimize noise, are operated with an optical or diode reset mechanism rather than a feedback resistor (Fig. 3). The detector bias is about 1000 V, and the junction field-effect transistor (JFET) gate operates at about −2.5 V. A radiation event causes a pulse of current in the detector. The amplifier drives this current i through the feedback capacitor with capacitance C and in doing so steps a voltage an amount estep proportional to the charge, as given by the equation below. Each subsequent radi-
Fig. 3. Reset mechanisms for junction detectors. (a) Diode reset. (b) Optical reset. (c) Amplifier output and differentiated output for the pulse height analyzer.
e step = Control of depletion region width. The detection of charged particles in the presence of gamma rays or higher-energy particles can be optimized by controlling the width of the depletion region. This width is a function of the reverse bias and of the resistivity of the base material. There is a practical limit
1 C
i dt
ation event causes a voltage step. To keep the amplifier within its dynamic range the feedback capacitor must be discharged. The analyzing circuits are first gated off, and in the diode case (Fig. 3a) the reverse bias on the diode is momentarily increased to give a picoampere current pulse. The amplifier output
Junction detector lithium diffused n contact
lithium diffused n contact
p core
compensated region
compensated region
p -type base material (a)
(b)
(c)
Fig. 4. Lithium-drifted detectors. (a) Planar. (b) Coaxial. (c) Open-one-end coaxial.
voltage changes to allow this current to flow through the feedback capacitor, discharging it. The analyzing circuit is now gated on, and counting can resume. For the optical reset (Fig. 3b), a light is flashed on the JFET, momentarily increasing the source-togate leakage current and discharging the feedback capacitor. The output from the amplifier and the differentiated output for the analyzer are shown in Fig. 3c. Lithium-drifted germanium detectors. Germanium with its higher atomic number, 32 compared with 14 for silicon, has higher radiation absorption than silicon. Lithium may also be drifted in germanium. But in germanium, lithium is mobile at room temperature and will precipitate or diffuse further if the units, after fabrication, are not kept at liquid nitrogen temperature. Lithium-drifted germanium detectors revolutionized the field of gamma-ray spectroscopy. They may be manufactured in planar, coaxial, or open-oneend coaxial geometry (Fig. 4).
106
Figure 5 compares the gamma-ray spectrum of Os taken with a 21-cm3 lithium-germanium detector with that from a sodium iodide (NaI) scintillatortype spectrometer which is 3 in. (7.5 cm) in diameter by 3 in. deep (27 in.3 or 330 cm3). The counting efficiency of the lithium-germanium detector is lower than the scintillator, but the resolution is at least an order of magnitude better. This higher resolution often reduces the actual counting time to adequately identify a particular energy peak even with an order-of-magnitude less sensitive volume. Also, as shown in Fig. 5, the lithium-germanium detector is able to resolve more energy groups than the scintillator. Hyperpure germanium detectors. Intrinsic or hyperpure germanium (Fig. 6) was grown to overcome the lowtemperature-storage and the lithium-drifting problems associated with lithium-germanium. Planar detectors with up to an 0.8-in.-thick (2-cm) depletion region and coaxial detectors with 3-in.3 (50-cm3) 188
155
relative yield
105
187 205
323 478 633
217
104
Nal spectrometer, 7.5 cm diameter by 7.5 cm deep 103 Li-Ge spectrometer, 21 cm3 102 100
200
300
400 500 channel number
600
700
Fig. 5. Gamma-radiation spectra from 188Os as detected in sodium iodide (Nal) and lithium-germanium (Li-Ge) spectrometers. 1 cm = 0.4 in.; 1 cm3 = 6 × 10−2 in.3
527
528
Junction detector gamma ray _
Compton electron + _
_
_
_
_
lithium diffused n contact
_
_
+ electrons _ + _ _ + _ + + + _
Compton gamma ray
holes +
+
+
+
+
boron implanted p contacts +
+
+
high-purity germanium
output
109 Ω
Fig. 6. High-purity germanium gamma-ray detector.
volume have been made with the material. Lowtemperature processing is used in the fabrication— usually lithium diffused at 536◦F (280◦C) for the n+ contact and implanted boron for the p+ contract. This low-temperature processing is desirable to prevent diffusion of copper, with its subsequent charge trapping, into the germanium. Presently hyperpure germanium detectors cannot be made either as large as, nor with as high a resolution as, lithium-germanium detectors. Both types are operated at liquid nitrogen temperature, 77 K. However, the hyperpure germanium detector is easier to manufacture and can be stored at room temperature when not in use. This is a tremendous practical advantage. Special detector configurations. Among the many other advantages of semiconductor detectors is the ease with which special detector configurations may be
b
a A
γ
A(a,b,γ)B B gamma spectrometer
junction beam
b
to amplifier and coincidence circuits
γ θ target
to amplifier and coincidence circuits Fig. 7. Schematic of use of annular detector in nuclear reaction studies.
fabricated. One of the simple yet very important examples of this is the annular detector (Fig. 7), which is characteristically used to detect nuclear reaction products from a bombarded target in a tight cone around the incident beam. By examining the decay radiation in coincidence with such products, studies may be carried out only on residual nuclei which have had their spins very highly aligned in the nuclear reaction; this has been shown to provide an extremely powerful nuclear spectroscopic probe. The annular detector is extensively used in laboratories worldwide. Composite detector systems are very readily assembled with the semiconductor devices. For example, it is standard in charged-particle detection to use a very thin detector and a very thick detector (or even two thin and one thick) in series. Multiplication of the resultant signals readily provides a characteristic identification signature for each nuclear particle species in addition to its energy. Threecrystal gamma-ray spectrometers are readily assembled, wherein only the output of the central detector is examined whenever it occurs in time coincidence with two correlated annihilation quanta escaping from the central detector. These systems essentially eliminate background from Compton scattering of other more complex electro-magnetic interactions and yield sharp single peaks for each incident photon energy (Fig. 8). Similarly neutrons may be indirectly detected through examination of recoil protons from a hydrogenous radiator in the case of high-energy neutrons, or through examination of fission fragments resulting from slow neutrons incident on a fissile converter foil mounted with the semiconductor detectors. (It should be noted that the response of the detectors is essentially perfectly linear all the way from electrons and photons to fission fragments.) Neutrons also may be detected and their energy spectra studied through examination of the charged products of the (nα) reaction (where alpha particles are emitted from incident neutrons) induced in the silicon or germanium base material of the detector itself. Fabrication of triodes. Whereas the detectors thus far discussed are electrically nothing more than diodes, it has been possible to construct equivalent triodes which have extremely important uses in that they provide not only an output which is linearly proportional to the energy deposited in them, but also a second output which in combination with the first establishes the precise location on the detector itself where the ionizing radiation was incident. This has very obvious advantages in the construction of simple systems for the measurement of angular distributions, where such position-sensitive detectors are located about a bombarded target. Their most important impact, however, has been in terms of their on-line use in the focal planes of large nuclear magnetic spectrographs. Simultaneous determination of the energy and location of a particle in the focal plane, together with the momentum
Junction detector
(a)
double- escape γ-2754 1732 keV γ-1368 keV
γ - 511
105
single-escape γ-2754 2243 keV
counts × 10
γ-2754 keV 104 double- escape γ -2754
counts per channel
(b)
103
double-escape γ-1368 double - escape γ-3861 2839 keV
102
10
1 300
750
500
1500 energy, keV
1000
2000
2500
3000
Fig. 8. Comparison of (a) direct single detector and (b) three-crystal spectrometer spectra from 24Na source.
determination by the magnet itself, establishes unambiguously both the mass and energy of the particle, and does so instantaneously so that additional logical constraints may be imposed through a connected on-line computer—something totally impossible with the earlier photographic plate focal-plane detectors (Fig. 9).
A further important utilization of the nuclear triodes has followed their fabrication in an annular geometry similar to that shown in Fig. 7. With radial position sensitivity it becomes possible to correct online, and event by event, for the kinematic variation of particle energy with angle over the aperture of the detector. Without this correction possibility all
nuclear triode 1
2
x
A
negative bias
P RL 22 megohms
position signal negative polarity
P=
y ×E x+y
E = particle energy P = position signal
energy signal positive polarity ◦
Fig. 9. Schematic of position-sensitive (nuclear triode) detector in focal plane of 180 magnetic spectrograph.
y
B
529
530
Junction diode particle group structures in the detector spectrum are smeared beyond recognition. See PARTICLE DETECTOR; SEMICONDUCTOR. James M. McKenzie Bibliography. D. A. Bromley, Detectors in nuclear science, Nucl. Instrum. Meth., 162:1–8, 1979; G. T. Ewan, The solid ionization chamber, Nucl. Instrum. Meth., 162:75–92, 1979; E. Laegsgaard, Positionsensitive semiconductor detectors, Nucl. Instrum. Meth., 162:93–111, 1979; J. M. McKenzie, Development of the semiconductor radiation detector, Nucl. Instrum. Meth., 162:49–73, 1979.
p -type diffused region
diffusion mask
p n semiconductor wafer (a)
p -type diffused layer
etching mask
material removed by etching
p n semiconductor wafer
Junction diode A semiconductor rectifying device in which the barrier between two regions of opposite conductivity type produces the rectification (Fig. 1). Junction diodes are used in computers, radio and television, brushless generators, battery chargers, and electrochemical processes requiring high direct current and low voltage. Lower-power units are usually called semiconductor diodes, and the higher-power units are usually called semiconductor rectifiers. For a discussion of conductivity types, carriers, and impurities see SEMICONDUCTOR. Junction diodes are classified by the method of preparation of the junction, the semiconductor material, and the general category of use of the finished device. By far the great majority of modern junction diodes use silicon as the basic semiconductor material. Germanium material was used in the first decade of semiconductor diode technology, but has given way to the all-pervasive silicon technology, which allows wider temperature limits of operation and produces stable characteristics more easily. Other materials are the group 13–15 compounds, the most common being gallium arsenide, which is used where its relatively large band-gap energy is needed. A partial list of silicon types includes the diffused silicon switching diode, alloyed silicon voltage reference diode, epitaxial planar silicon photodiode, and diffused silicon rectifier. Other types include the ionimplanted varactor diode and the gallium arsenide light-emitting diode.
p - region connection alloy region
pn junction recrystallized semiconductor
p n semiconductor wafer mounting plate
n -region connection
solder
Fig. 1. Section of a bonded or fused junction diode.
(b)
p -material removed by lapping and etching
Fig. 2. High-speed diffused silicon diodes. (a) Mesaless structure. (b) Mesa structure.
In silicon units nearly all categories of diodes are made by self-masked diffusion, as shown in Fig. 2a. Exceptions are diodes where special control of the doping profile is necessary. In such cases, a variety of doping techniques may be used, including ion implantation, alloying with variable recrystallization rate, silicon transmutation by neutron absorption, and variable-impurity epitaxial growth. The mesa structure shown in Fig. 2b is used for some varactor and switching diodes if close control of capacitance and voltage breakdown is required. See ELECTRONIC SWITCH; RECTIFIER; SEMICONDUCTOR DIODE. Fabrication methods. The alloy and mesa techniques are largely historical, but were important in the development of junction diodes. The alloy junction section (Fig. 1) is produced by placing a pill of doped alloying material on the clean flat surface of a properly oriented semiconductor wafer and heating it until the molten alloy dissolves a portion of the semiconductor immediately beneath it. Upon cooling, the dissolved semiconductor, now containing the doping impurity, recrystallizes upon the surface of the undissolved semiconductor, reproducing its crystal structure and creating a pn junction at the position marking the limit of the solution of the original wafer. If such a junction is held at the peak temperature of its alloying cycle for sufficient time to allow diffusion of the alloy impurity beyond the limit of the dissolved semiconductor into the solid semiconductor, the junction produced is called alloy-diffused. The planar diffused junction section (Fig. 2a) is produced in silicon by first polishing the top surface of a large silicon wafer and then oxidizing the surface by heating the wafer at about 1000◦C (1800◦F) in the presence of wet oxygen. After about 0.5 micrometer of oxide has grown on the surface, the wafer is cooled, and an array of holes is opened in the
Junction diode oxide by high-precision etching geometrically controlled by a photoresist technique. A very heavily doped thin oxide layer is chemically deposited in the holes opened in the oxide. This predeposition step is followed by a drive-in diffusion at a higher temperature, causing the deposited impurity to penetrate the substrate, thereby forming diffused pn junctions beneath each hole. Subsequently the individual junctions are separated out of the large wafer by scribing and breaking and are encapsulated as individual diodes. Such planar diffused diodes have relatively high breakdown voltages and low leakage currents. The ends of the junction are automatically protected by the oxide mask so that such diodes show longterm stability. This protection by the oxide is often referred to as passivation. Planar diodes and planar transistors are used in integrated circuits. The diodes in integrated circuits usually consist of the emitter junction or collector junction of a transistor structure rather than being fabricated as a separate diode. Most discrete diodes are power rectifiers, voltage regulators, varactors, or light-emitting diodes. See INTEGRATED CIRCUITS; VARACTOR; VOLTAGE REGULATOR. The mesa structure (Fig. 2b) is produced by diffusing the entire surface of the large wafer and then delineating the individual diode areas by a photoresistcontrolled etch that removes the entire diffused area except the island or mesa at each diode site. Still another method of doping control used in modern diodes is through epitaxially deposited material. In this process the polished wafer is subjected at an elevated temperature to a vapor containing a compound of the semiconductor together with a compound containing the appropriate doping element. These compounds decompose upon contact with the surface of the wafer and cause the semiconductor to grow a layer of doped material on its surface. Under proper conditions of cleanliness and growth rate, the underlying crystal structure is propagated into the growing layer, which is then said to be epitaxial in character. In this way either localized areas or entire surfaces of either conductivity type may be produced. In diode fabrication it is typical to use the epitaxially grown material as a lightly doped layer over the original substrate material of the same conductivity type. The junction is then formed in the lightly doped layer by masked diffusion of the opposite-conductivity-type material. By this means the thickness of the web of lightly doped material immediately beneath the diffusion can be controlled to give both a desired reverse breakdown voltage and a relatively constant capacitance. Forward-bias recovery time can be controlled in a trade-off with reverse breakdown voltage in such a structure. A method of doping control used when special doping concentration profiles are needed, or when localized doping must be accomplished without selfmasking oxide, is ion implantation. At present the largest use of this technique in pn junction fabrication is to replace the chemical predeposition step in the planar diffusion process. Here ion implantation
gives a much more precise control of the sheet resistivity of the diffusion, and it can be accomplished without opening holes in the protective oxide. Crystal damage is automatically healed during the subsequent drive-in diffusion. See ION IMPLANTATION. Junction rectification. Rectification occurs in a semiconductor wherever there is a relatively abrupt change of conductivity type. In any semiconductor the product of the concentrations of the majority and minority current carriers is a temperaturedependent equilibrium constant. The conductivity is proportional to the majority carrier concentration and inversely proportional to the minority-carrier concentration. When a pn junction is reverse-biased (p-region negative with respect to the n-region), the majority carriers are blocked completely by the barrier, and only the minority carriers can flow under the barrier. This minority carrier current is the sum of the individual currents from the n- and p-regions, and each component is inversely proportional to the conductivity of its region. In addition, there is a thermal regeneration current of minority carriers generated in the depletion region of the reverse-biased junction. In silicon the regeneration current dominates and is about 10-3 A/m2 at room temperature. When a pn junction is forward-biased (p-region positive with respect to the n-region), the majority hole and electron distributions can flow into the opposite region because the bias has markedly lowered the barrier. Since electrons flowing into a p-region or holes flowing into an n-region represent a great increase in minority-carrier concentration, the thermodynamic equilibrium of the holes and electrons is disturbed, and the product of their concentrations increases as the junction is approached. The resistivity of both the n- and p-type regions is considerably lowered by these excess minority carriers, and the forward current is greater than the current through a geometrically equivalent bar of material containing no pn junction. The electrons in an n-type semiconductor are given up to the conduction process by donor impurity atoms which remain as fixed, positively charged centers. Similarly, the holes of a p-region are created by the capture of electrons by acceptor impurity atoms which remain as fixed, negatively charged centers. In both cases the space charge of the ionized impurity centers is neutralized by the space charge of the majority carriers. At a pn junction the barrier that keeps majority carriers away consists of a dipole layer of charged impurity centers, positive on the n-type side and negative on the p-type side. When a reverse bias is applied, the barrier height increases and requires more charge in the dipole layer to produce the required step in voltage. To add to the charge, the layer must widen, because ionized impurities are in fixed positions in the crystal. As the layer widens, the capacitance of the junction decreases since the plates of the capacitor are farther apart. Therefore, a pn junction acts as a variable capacitance as well as a variable resistance. In this application, it is called a varicap or a varactor diode.
531
Junction diode
102
diode bias voltage, volts
103 forward characteristic Ge power rectifier 25°C
101 reverse characteristic
100
+10 forward 0 reverse
time
−10
10−1 turn - on time Si power rectifier 200°C
10−3 10−4 10−5
Ge switching diode 25°C
diode current, mA
10−2
amperes
+10
turn-off time
forward 0
time
reverse
normal reverse minority carrier current storage effect
−10
(b) 10−6 10−7 102
10−8
capacitance, pF
532
Si switching diode 25°C
10−9 10−10 10−11 10−3 (a)
10−2
10−1
100 101 volts
102
forward bias 101 100 10−1 10−3
103 (c)
reverse bias
10−2
10−1
100 volts
101
102
◦
◦
Fig. 3. Junction diode characteristics. (a) Rectification. (b) Switching. (c) Silicon switching diode capacitance. F = ( C × 1.8) + 32.
Optical properties. When light of sufficient energy is absorbed by a semiconductor, excess minority carriers are created. In a pn junction device these excess carriers will increase the reverse-bias leakage current by a large factor if they are within diffusion distance of the junction. If the junction is open-circuited, a forward voltage will develop to oppose the diffusion of the excess carriers generated by the light absorption. This photovoltaic response is the basis of the operation of solar cells and most photographic exposure meters. See EXPOSURE METER; PHOTOVOLTAIC CELL; PHOTOVOLTAIC EFFECT; SOLAR CELL. The inverse of the above effect also exists. When a pn junction in a so-called direct-gap semiconductor is forward-biased, the electrically injected excess minority carriers recombine to generate light. This is the basis of light-emitting diodes and injection lasers. Typical direct-gap semiconductors (suitable for lightemitting diodes) are compounds between 13 and 15 group elements of the periodic table such as gallium arsenide. See LASER. Characteristics. Typical rectification, switching, and capacitance characteristics of a junction diode
are shown in Fig. 3. Rectification characteristics (Fig. 3a) show that silicon units provide much lower reverse leakage currents and higher voltage breakdowns and can operate up to 200◦C (392◦F). For switching purposes, turn-on and turn-off times are most important (Fig. 3b). The turn-on time of a diode is governed by its junction capacitance and is usually short. The turn-off time, usually the critical characteristic, is governed by the time required to remove all of the excess minority carriers injected into the n- and p-regions while the diode was in the forward-bias state. This is called the minority carrier storage effect, and it is of the order of a few microseconds for good switching diodes. Silicon diodes are usually somewhat superior to germanium units in this respect. The limits of operation of present junction diodes are about 2500 V reverse-standoff voltage and 1500 A forward current in power rectifiers; about 1.0 nanosecond reverse recovery time and 100 picoseconds rise time for fast-switching diodes; a minimum reverse leakage current in a small signal diode is about 0.01 nA.
Junction transistor
base
10 0 1
(a)
2 3 4 5 distance through transistor, mils
1
0 emitter base
30
collector
20 30
emitter
p -type
0
base
collector
40 50
0.1 0.2 0.3 0.4 distance through transistor, mils
6
Cp emitter
150
collector
100 base
50 0
Cn 1
0.1 0.2 0.3 0.4 distance through transistor, mils
10 10
Cn
200
6
impurity concentration, 10−15 atom/cm3
p -type n -type
2 3 4 5 distance through transistor, mils
10
(c)
0
300 250
2 3 4 5 distance through transistor, mils
6
Cn
200 150 100 50 0
300 250
emitter
0
40
n -type
impurity concentration, 10−15 atom/cm3
base
10
50
base
50
1 2 3 4 5 distance through transistor, mils
impurity concentration, 10−15 atom/cm3
20
(b)
(d)
collector
emitter
30
20
100
250
40
10
emitter
150
6
50
10
Cp
base
0.1 0.2 0.3 0.4 distance through transistor, mils wafer substrate
Cn
epitaxial layer
200 100 50
Cp
150
0
collector
Cp
emitter
10
200
collector
impurity concentration, 10−15 atom/cm3
collector
p -type
emitter
30 20
n -type
conductivity, siemens
250
40
conductivity, siemens n -type p -type
A transistor in which emitter and collector barriers are formed by pn junctions between semiconductor regions of opposite conductivity type. These junctions are separated by a distance considerably less than a minority-carrier diffusion length, so that minority carriers injected at the emitter junction will not recombine before reaching the collector barrier and therefore be effective in modulating the collector-barrier impedance. Junction transistors are widely used both as discrete devices and in integrated circuits. The discrete devices are found in the high-power and high-frequency applications. Junction transistors range in power rating from a few milliwatts to about 300 W, in characteristic frequency from 0.5 to 2000 MHz, and in gain from 10 to 50 dB. Silicon is the most widely used semiconductor material, although germanium is still used for some applications. Junction transistors are applicable to any electronic amplification, detection, or switching problem not requiring operation above 200◦C (392◦F), 700 V, or 2000 MHz. Not all these limits can be achieved in one device, however. Junction transistors are classified by the number and order of their regions of different conductivity type, by the method of fabricating and structure, and sometimes by the principle of operation. Most modern transistors are fabricated by the silicon self-masked planar double-diffusion technique. The alloy technique and the grown-junction technique are primarily of historical importance. For a general description and definition of terms used here and a description of the mechanism of operation see TRANSISTOR. Alloy-junction transistors. Also called fusedjunction transistors, these are made in the pnp and npn forms. The emitter and collector regions are formed by recrystallization of semiconductor material from a solution of semiconductor material dissolved in some suitably doped metal alloy. The major metal of the alloy serves as the solvent for the semiconductor, while the minor element serves as a source for doping impurity in order to render the recrystallized material opposite in conductivity type to the original wafer. Alloy junctions are abrupt and allow for bidirectional operation. They usually show a low series resistance, and were therefore used in high-power transistors. Figure 1 compares several transistor profiles which show how the impurity content varies through the structure. In these profiles Cp is the concentration of the p-type impurity; Cn is the con-
conductivity, siemens
Junction transistor
50
conductivity, siemens
For further discussion of the properties of pn junctions see JUNCTION TRANSISTOR; TRANSISTOR. Lloyd P. Hunter Bibliography. J. Singh, Semiconductor Devices: An Introduction, 1994; S. M. Sze, Semiconductor Devices, 1985; E. S. Yang, Microelectronic Devices, 1988; M. Zambuto, Semiconductor Devices, 1989.
533
collector base
0.1 0.2 0.3 0.4 distance through transistor, mils
Fig. 1. Conductivity and impurity profiles of typical junction transistors. (a) pnp alloy-junction type. (b) pnp grown-junction type. (c) npn double-diffused-junction type. (d) npn epitaxial double-diffused-junction type. 1 mil = 25.4 µm.
centration of the n-type impurity. The net impurity content determines the conductivity type and magnitude. The profile of the alloy transistor shows that there are abrupt changes of impurity concentration at emitter and collector junctions and that the conductivities of emitter and collector regions are therefore high compared to those of the base region. Such a structure shows good emitter-injection efficiency but only moderate collector-voltage rating and relatively high collector capacitance. See SEMICONDUCTOR.
534
Junction transistor Grown-junction transistors. These are made in the pnp and npn forms, as well as in more complicated forms. There are several variations of the grownjunction technique. The simplest consists of successively adding different types of impurities to the melt from which the semiconductor crystal is being grown. A semiconductor crystal is usually grown by dipping the end of a seed crystal into molten semiconductor and by arranging the thermal gradients so that new semiconductor solidifies on the end of the seed as it is slowly withdrawn. The solid-liquid interface is roughly a plane perpendicular to the axis of withdrawal. A pnp structure can be grown by starting with a p-type melt; by adding, at one point in the crystal growth, enough n-type impurity to give a slight excess over the p-type impurity originally present; and, after growth has continued for a few micrometers, by adding an excess of p-type impurity. The last-grown region will be the emitter region, and the original p-type crystal will be the collector region. The impurity profile of such a structure is shown in Fig. 1b. The high-conductivity emitter region gives a good injection efficiency, and the junction between the base and collector regions is gradual enough so that the unit will show a relatively low collector capacitance and a high breakdown voltage. The one disadvantage of this method is that both the collector and base regions show relatively high series resistances. Planar diffused epitaxial transistors. The structure of this transistor is shown in section in Fig. 2, and the doping profile through the emitter, base, and collector is shown in Fig. 1d. In this structure both collector and emitter junctions are formed by diffusion of impurities from the top surface, as shown in Fig. 2. Using silicon, the structure is formed by growing a diffusion mask of native oxide (silicon dioxide) on the previously polished wafer. A hole is opened in the oxide by a photoresist etch technique (Fig. 2a) to define the area of the collector buried layer. For a p-type substrate a heavy concentration (n+) of ntype impurity such as phosphorus is diffused into the substrate through the opening in the masking oxide. The oxide is etched away, and an epitaxial layer of lightly doped (n−) silicon is grown over the entire wafer by vapor decomposition at a temperature low enough to prevent significant diffusion of the n+ material out of the buried layer (Fig. 2b). A new oxide layer is grown on the surface of the epitaxial layer, and an opening is etched in it to define the limits of the p-type base diffusion (Fig. 2c). (This automatically controls the collector junction geometry and capacitance.) The masking oxide is again stripped and regrown for the masking of the n+ diffusion used to form the emitter and collector contact region (Fig. 2d). Next the emitter mask is removed, and an impervious layer of oxide is formed over the surface of the crystal. A layer of glass is bonded to the crystal by means of the oxide layer. The glass must match the expansion coefficient of silicon fairly well, and the oxide must be sufficiently impervious to the glass at the bonding temperature to prevent the diffusion of
buried-layer diffusion
oxide diffusion mask
n+ substrate
p
original wafer
(a)
buried layer
epitaxial layer
n− n+ p
substrate (b)
buried layer
base diffusion mask base region
p
n−
n+ substrate
p
(c) collector contact region
diffusion mask
n+
emitter
n+ p
n−
n+ substrate
p
(d) passivated oxide protective layer
collector contact base contact emitter contact
n+
p substrate
n− p
n+
(e) Fig. 2. Double-diffused planar epitaxial transistor structure and method of fabrication. (a) Buried layer. (b) Epitaxial layer. (c) Collector junction formation. (d) Emitter junction. (e) Contact stripe placement.
impurities from the glass into the silicon transistor structure. Finally, holes are etched in the glass-oxide structure so that electrical contact can be made to the various regions of the transistor (Fig. 2e). In modern technology the above-described base and emitter diffusions are carried out in two steps: a predeposition step, in which a very thin layer of heavily doped oxide is chemically deposited over the open surface of the silicon in the hole opened in the masking oxide; and a drive-in diffusion step, in which the deposited dopant is diffused into the silicon at a higher temperature than that used for the predeposition. This typically controls the sheet resistance of the final diffusions to about ±10% of the design value. The forward current-transfer ratio of the transistor is determined by the ratio of these sheet resistances through the medium of the injection efficiency of minority carriers.
Jungermanniales The chemical predeposition step is being replaced by ion implantation directly through the oxide. The masking is accomplished by placing a layer of photoresist on top of the oxide. This eliminates the oxide etching step and allows an accurately metered deposition by controlling the time of bombardment and the current of the ion beam. This modification of the process promises to keep the emitter and base region sheet resistivities within about ±1% of the design value. See ION IMPLANTATION. In this transistor, formation of the base region by diffusion from the emitter side produces a steep doping gradient and thereby a strong electric field in the base region. In the typical alloy-junction transistor (uniform base doping) the minority-carrier transport across the base is achieved by a relatively slow diffusion process. In this diffused base type (sometimes called a drift transistor) the base region shows a high conductivity gradient, decreasing from the emitter to the collector (Fig. 1d). This conductivity gradient means that the majority-carrier concentration is much greater near the emitter than near the collector. In order to cancel the natural diffusion of majority carriers from the high- to the low-concentration region, an electric field must exist of such a polarity as to tend to drive majority carriers back toward the emitter. This polarity of field then tends to drive minority carriers from the emitter to the collector; when normal bias is applied to the device, excess injected minority carriers will be accelerated across the base by this field. The buried layer of n+-doped material has very low resistance and acts as a shorting bar between the area immediately beneath the emitter and the area immediately beneath the collector contact stripe, thus maintaining a low collector series resistance even if the n- material of the epitaxial region is of quite high resistivity. The collector breakdown voltage may be maintained at a reasonably high value and the collector capacitance at a low value by controlling the thickness of the n- material between the base and the buried layer and by keeping the doping level of the n- material quite low. Mesa transistors. These transistors minimize the collector capacitance by limiting the collector junction area. This area limitation is achieved by etching away the surrounding material so that the entire transistor structure stands up from the body of the wafer like a small mesa. This structure gave the highest frequency response for many years. It is now replaced by the planar type of structure. Power transistors. These are used in the output stage of an electronic circuit both as switches and as amplifiers. Depending on the load, a high voltage rating, a high current rating, or a high power rating may be required. With any of these, heat dissipation within the device is a serious limitation. In order to obtain high current capability in a power transistor, a large emitter junction area is required. The baseregion recombination current produces a lateral voltage drop between the center of the emitter area and the center of the base contact area in a planar device. This voltage tends to bias the center of the emitter
area to off and concentrate the injection current at the periphery of the emitter. Modern silicon power transitors minimize this effect by using emitter junctions with a large perimeter-to-area ratio, usually in the form of a multipronged fork with base contacts interdigitated between the tines. This preserves high forward current-transfer ratio to large values of emitter current. See CONTROLLED RECTIFIER. Unijunction transistor. This device is really a diode in which the resistance of a portion of the base region is modulated by minority carriers injected by forward-biasing its single junction. Its structure typically consists of a lightly doped base region with ohmic contacts at opposite ends. The single junction is formed over a narrow range near the center of the base region by a shallow diffusion of heavily doped material of the opposite conductivity type. If a bias current is set up from end to end in the base, the potential at the junction can be set at a desired reverse bias relative to ground. If a signal is applied to the junction electrode, the device will turn on when the signal exceeds the original reverse-bias potential of the base at that point. Once forward-biased, the junction injects sufficient minority carriers into the base to short the region beneath the junction to the ground end of the base, and the device remains conducting until reset, either by the base bias or by the emitter signal. These devices show a typical negative resistance characteristic and are used for timing, control, and sensing circuits. Summary. Silicon planar passivated transistors show a wide range of performance with characteristic frequencies up to 2000 MHz, voltage ratings of 12–700 V, and power dissipation ratings of 100 mW– 300 W. The highest-frequency devices range up to 4000 MHz. Silicon planar technology is used in fabricating integrated circuit chips. The general form of the transistor structure displayed in Fig. 2 is used in integrated circuits. Such a structure is used for diodes as well as transistors since, for example, it is necessary only to connect the base and collector contacts to use the collector junction as a diode. See INTEGRATED CIRCUITS. Lloyd P Hunter Bibliography. J. J. Brophy, Basic Electronics for Scientists, 5th ed., 1990; G. W. Neudeck, The Bipolar Junction Transistor, 2d ed., 1989; S. M. Sze, Semiconductor Devices, 1985; E. S. Yang, Microelectronic Devices, 1988.
Jungermanniales The largest order of liverworts, often called the leafy liverworts; it consists of 43 families. The leaves are in three rows, with the underleaves usually reduced or lacking. Other distinctive features include a perianth formed by a fusion of modified leaves, a short-lived seta, and a four-valved capsule. The leaves have an embryonic bilobed phase which may be lost on further development. The plants of this order are dorsiventrally organized and leafy. They grow by means of an apical cell with three cutting faces, resulting in two
535
536
Jungermanniidae rows of lateral leaves and a third row of underleaves which are generally reduced, and sometimes lacking. The stems lack a central strand. A medulla of delicate cells is generally surrounded by a cortex. Rhizoids are usually present, all smooth. The leaves pass through a primordial two-lobed stage but may become two- to several-lobed, or unlobed (owing to obliteration of one primordial lobe). A midrib is lacking. The cells often have corner thickenings, and oil bodies are usually present in all green cells. Asexual reproduction by gemmae is common. Antheridia occur in leaf axils, sometimes also in axils of underleaves. The stalk is usually 1-2- or less commonly 4-6seriate. Archegonia are terminal and have a 5-seriate neck. The sporophyte is usually protected by a perianth (in addition to a calyptra) formed by the fusion of leaves. The seta, usually long, consists of delicate, hyaline cells. The capsule is four-valved, with a wall of 2–10 layers of cells. The elaters have spiralthickened walls. The haploid chromosome number is 8 or 9. See BRYOPHYTA; HEPATICOPSIDA; JUNGERMANNIIDAE. Howard Crum Bibliography. M. Fulford, Manual of the leafy Hepaticae of Latin America, Mem. N.Y. Bot. Gard., 11:1–172 (1963), 173–276 (1966), 277–392 (1968), 393–535 (1976); R. M. Schuster, The Hepaticae and Anthocerotae of North America East of the Hundredth Meridian, vols. 1–4, 1966–1980.
Jungermanniidae One of the two subclasses of liverworts (class Hepaticopsida). The plants may be thallose, with little or no tissue differentiation, or they may be organized into erect or prostrate stems with leafy appendages. The leaves, generally one cell in thickness, are mostly arranged in three rows, with the third row of underleaves commonly reduced or even lacking. Oil bodies are usually present in all of the cells. The rhizoids are smooth. The capsules, generally dehiscing by four valves, are usually elevated on a long, delicate, shortlived seta. The spore mother cells are deeply lobed. The subclass consists of the orders Takakiales, Calobryales and Jungermanniales, which are leafy, and the Metzgeriales, which are mostly thallose. See BRYOPHYTA; CALOBRYALES; HEPATICOPSIDA; JUNGERMANNIALES; METZGERIALES; TAKAKIALES. Howard Crum Bibliography. H. Inoue, Illustrations of Japanese Hepaticae, vol. 1, 1974; R. M. Schuster, Phylogenetic and taxonomic studies on Jungermanniidae, J. Hattori Bot. Lab., 36:321–405, 1972.
Jupiter The largest planet in the solar system, and the fifth in the order of distance from the Sun. It is visible to the unaided eye except for short periods when in near conjunction with the Sun. Usually it is the second brightest planet in the sky; only Mars at its maximum luminosity and Venus appear brighter.
Planet and its orbit. The main orbital elements are a semimajor axis, or mean distance to the Sun, of 484 × 106 mi (778 × 106 km); an eccentricity of 0.0489, causing the distance to the Sun to vary about 47 × 106 mi (75 × 106 km) between perihelion and aphelion; sidereal revolution period of 11.86 years; mean orbital velocity of 8.1 mi/s (13.1 km/s); and inclination of orbital plane to ecliptic of 1.3◦. See PLANET. The apparent equatorial diameter of its disk varies from about 47 at mean opposition to 32 at conjunction. The polar flattening due to its rapid rotation is considerable and is easily detected by visual inspection; the ellipticity is (re − rp)/re = 0.065, where re is the equatorial radius and rp is the polar radius. The equatorial diameter is about 88,850 mi (142,984 km), and the polar diameter is 88,086 mi (133,708 km). The volume is about 1321 (Earth = 1), and the mass is about 317.5 (Earth = 1). The mean density is 1.33 g/cm3, a low value characteristic of the four giant planets. The mean acceleration of gravity at the visible surface is about 85 ft/s2 (26 m/s2); however, the centrifugal force at the equator reduces the effective acceleration of gravity to about 78 ft/s2 (24 m/s2). Phases. As an exterior planet, Jupiter shows only gibbous phases and full phase from Earth. Because of the large size of Jupiter’s orbit compared with that of the Earth, the maximum phase angle is only 12◦ at quadratures, and the phase effect shows up only as a slightly increased darkening of the edge at the terminator. The apparent visual magnitude at mean opposition is −2.4, and the corresponding value of the reflectivity (geometrical albedo) is about 0.3; the physical albedo is 0.34. The high value of the albedo, characteristic of the four giant planets, indicates the presence of a dense, cloud-laden atmosphere. See ALBEDO. Telescopic appearance. Through an optical telescope Jupiter appears as an elliptical disk, strongly darkened near the limb and crossed by a series of bands parallel to the equator (Fig. 1). Even fairly small telescopes show a great deal of complex structure in the bands and disclose the rapid rotation of the planet. The period of rotation is about 9 h 55 m, the shortest of any planet. The features observed, however, do not correspond to the solid body of a planet but to clouds in its atmosphere, and the rotation period varies markedly with latitude. The rotation period of any given zone is not exactly constant but suffers continual fluctuations about a mean value. Occasionally, short-lived atmospheric phenomena may depart more strongly from the mean rotation period of the zone in which they appear, and thus drift rapidly with respect to other details in the zone. The rotation axis is inclined only 3◦ to the perpendicular to the orbital plane, so that seasonal effects are practically negligible. Red Spot. Apart from the constantly changing details of the belts, some permanent or semipermanent markings have been observed to last for decades or even centuries, with some fluctuations in visibility. The most conspicuous and permanent marking is the
Jupiter
537
S south polar region S.S.S. temperate belt S.S. temperate zone south temperate belt south tropical zone
Red Spot
south tropical belt equatorial zone north tropical belt north tropical zone north temperate belt N.N. temperate zone N.N.N. temperate belt north polar region (a)
(b)
N
Fig. 1. Jupiter. (a) Telescopic appearance from the Hubble Space Telescope (Space Telescope Science Institute: Jet Propulsion Laboratory; NASA). (b) Principal bands.
great Red Spot, discovered by J. D. Cassini in 1665 and observed continually since 1878, when it gained its present red color. At times it has been conspicuous and strongly colored; at other times it has been faint and only slightly colored, and occasionally only its outline or that of the bright “hollow” of the south temperate zone which surrounds it has remained visible. The current dimensions of the Red Spot are 15,500 mi (25,000 km) in longitude and 7500 mi (12,000 km) in latitude, though the longitudinal dimension varies from 25,000 mi (40,000 km) to 9000 mi (14,000 km). The Spot has been shrinking since 1943 at a rate of 0.19◦ of longitude per year. It also drifts in longitude by 4.5◦ per year. It has an essentially calm center, surrounded by a restraining “collar” of high-speed winds which rotate counterclockwise in 6–10 days. This anticyclonic storm (rotation in opposite direction to the planet’s) consists of an elevated central region of ammonia gas and ice clouds (Fig. 2). Wind speed increases with depth. The Spot is surrounded by towering water-vapor-based thunderstorms, with the deepest clouds in the surrounding collar. The total depth is probably 12–25 mi (20–40 km). These clouds are about 18◦F (10◦C) colder than the surrounding weather. The Spot’s distinctive coloration is probably due to chemical compounds (perhaps containing phosphorus) transported from deep within the atmosphere. The origin and longevity of the Great Red Spot remain difficult to explain. The vortex is probably not attached to any solid surface feature below the clouds. It is thought to be an eddy of atmospheric gases driven by the strong Coriolis force that stems from the planetary rotation. Atmospheric perturbations or whirlpools coalesce as they pass in the underlying jet stream, forming a
vortex of increasing size, which can maintain stability for some time. Several smaller and similarly colored versions of the Spot have been observed, but none has attained the size or lifetime of the original. Galileo probe observations suggest that the energy necessary to sustain Red Spot survival comes from the Spot’s regular consumption of these eddies several thousand kilometers in diameter, which carry up convective energy from below. This implies a possible lifetime around 100,000 year. See CORIOLIS ACCELERATION.
north
5000 km
Fig. 2. Details of the Red Spot as seen by the Voyager 1 fly by. Features include a white oval with a wake of counterrotating vortices, puffy features inside the Red Spot, and reverse-S spirals inside both the Red Spot and the oval. The large white feature extending over the northern part of the Red Spot was observed to revolve about the Red Spot center with a period of 6 days. 5000 km = 3000 mi. (NASA)
538
Jupiter Atmosphere. Jupiter’s visible belts and zones reflect the complicated vertical atmospheric structure. The atmosphere consists primarily of hydrogen (H2), helium (He), ammonia (NH3), methane (CH4), water (H2O), hydrogen sulfide (H2S), and other trace compounds such as acetylene (HC CH), ethane (CH3CH3), hydrogen cyanide (HCN), hydrogen deuteride (HD), carbon monoxide (CO), and phosphine (H3P). The highest atmospheric features are ammonia crystal haze layers, variable in appearance and density. The visible cloud tops are formed of aerosols of mostly ammonia, with little water vapor, akin to cirrus clouds in Earth’s atmosphere. As depth increases, cloud particles become ammonia crystals and ammonium hydrosulfide (NH4SH) droplets, followed by cumulus clouds of water ice crystals and droplets (Fig. 3). All the compounds previously mentioned are colorless; therefore, the strong yellow, red, blue, and brown colorations seen in the clouds must be due to other agents. Perhaps phosphorus and sulfur compounds create the yellows and reds, while the blue features may be hydrogen made visible by Rayleigh scattering. Infrared observations have found the highest features to be red, followed by white, brown, and blue. The Galileo probe found unexpected results in its plunge into the upper Jovian atmosphere. No dis-
180 auroras
mesosphere
160
0.01
140
tropopause
haze layers?
100
0.1
visible cloud tops NH3 ice?
0.5
80 1 lightning
60
NH4 HS ice? 40
2
H2O, ice?
troposphere
10? 20
H2O, NH3 H2, H0, CH4, NH3, H2O
0
−20
0
100
200
300 400 500 temperature, K
100?
600
700
800
Fig. 3. Diagram showing the locations of the cloud decks of Jupiter. 1 km = 0.6 mi.
pressure, bars
height, km
120
tinct water cloud was observed, and less than 10% of the expected abundance of water vapor was seen, although two distinct ammonia and ammonium hydrosulfide cloud levels were seen. It is thought that the probe may have entered an unusually clear and dry area of the atmosphere, a figurative desert. More recent observations give an overall water content similar to solar values. When signal was lost, the probe had penetrated 0.22% of the Jovian atmosphere (about 43 mi or 70 km) where the temperature was 1970◦F (1350 K, hotter than expected). This excess heat may have been caused by gravity waves from the lower atmosphere or else heating by soft electron collisions on the hydrogen (H2) molecules. The ammonia cloud was very tenuous and inhomogeneous near the probe entry point. An ammoniasulfur cloud was encountered well below its expected height, pointing out the extreme variability of the atmosphere. The probe also found helium raining from upper to lower depths. The Galileo probe discovered anomalously high quantities of argon, krypton, and xenon, suggesting that Jupiter had to trap them physically by condensation or freezing. This would require Jovian formation farther out in the solar system than its present location, possibly in the Kuiper Belt. See KUIPER BELT. The atmospheric structure displays latitudinal variations. Clouds near the equator are lifting and spreading toward higer latitudes, as in the horse latitudes on Earth. Atmospheric aerosols subside at the equator and upwell at high latitudes. The mean temperature of the Jovian disk is −259◦F (129 K), but bands of high temperature are found at the poles and midlatitudes. The equatorial regions alternate between warm and cold over a cycle of 4–5 years. Zones and belts. The belts and zones represent regions of differing cloud altitudes and compositions. The atmosphere is divided by a series of prograding and retrograding jet streams. Convection is strong since the bright zones are cooler and higher by 9–12 mi (15–20 km) than the dark belts, with their higher albedos arising from solid ammonia crystals. The zones are regions of upward motion and high pressure. Galileo found wind speeds of 0.1– 0.15 mi/s (160–220 m/s) to extend to unexpected depths, as well as supersonic turbulence in the upper atmosphere on the order of 0.6–6 mi/s (1–10 km/s). This is similar to motions in stellar, not planetary, atmospheres, suggesting particularly stormy upper regions. The energy to maintain such high-velocity winds likely comes from fluid-mechanical convective processes from the interior heat source (discussed below), with some contribution from lightning. The equatorial zonal jet streams have not sensibly changed in over 80 years of observation, despite vigorous turbulence in the defining shear zones. Belts are aerosol-free and dry; zones are cloudy and ammonia-rich. The motions in the belts are cyclonic, and anticyclonic in the zones. The anticyclones have smooth, nearly elliptical cloud shapes, while the cyclones have large, filamentary, diffuse cloud patterns (Fig. 4). The large weather systems
Jupiter
Fig. 4. Fine details of Jovian atmospheric features as photographed by Voyager 1. The dark halo surrounding the smaller bright spot to the right of large oval is a hot spot, a warmer region of Jupiter’s atmosphere. (NASA)
move more slowly than the jets that constrain them. Individual vortices form within the zones and belts, some persisting for decades. The major white ovals are hot-spot regions of strong infrared emission and are exceptionally dry regions of descending air, while other features such as the plumes observed in the equatorial region may be surface phenomena. The great scale changes in the belts and zones can be gradual (time scales of several months) or cataclysmic and sporadic (several days). For example, in 1990 a new temperate belt was created over a 3-month period as several bright features and white and dark spots coalesced. Oscillations. Jupiter has global acoustic-mode radial oscillations, discovered in 1991, with periods of 8– 17 min. Therefore, the entire planetary atmosphere also has cells of regular vertical motion which may contribute to the complexities of its meteorology. Comet impact. During July 16–22, 1994, twentyone observable fragments of Comet ShoemakerLevy 9 impacted the Jovian atmosphere at nearly 120,000 mi/h (200,000 km/H), the first time such an event had been witnessed. Though occurring on Jupiter’s nonvisible side, each strike was clearly visible as a dark area in the south temperate zone as the planet’s rapid rotation brought it into view. In addition, ejecta and impact plumes were seen to rise approximately 1860 mi (3000 km) above the planetary limb. The impact of fragment G, thought to be the largest at 2–3 mi (3–4 km) in diameter, released energy equivalent to 6 × 1012 tons of TNT and left a dark spot 1550 mi (2500 km) in diameter surrounded by concentric rings (Fig. 5), which expanded at 0.28 mi/s (0.45 km/s). Each strike produced superheated gaseous bubbles, observed at infrared wavelengths, which rose above the stratospheric methane and cooled into aerosols (silicate dust, hyrocarbons, or hydrogen cyanide), as well as increased aurorae near the sites. The event provided a first probe into the Jovian
atmosphere. Traces of ammonia, sulfur (S2), and hydrogen sulfide in the fireballs showed that they penetrated the upper two cloud layers of ammonia and ammonium hydrosulfide, but the absence of water vapor suggests that the comet itself was low in oxygen-rich ices and volatiles. Surges in Jupiter’s decimetric radio emission during the impacts suggested that large numbers of electrons were injected into the Jovian magnetosphere. Given the number of such objects in the solar system, it is thought that Jupiter is impacted once every 500 years by a 0.2-midiameter (0.3-km) object and every 6000 years by a 1-mi (1.6-km) comet such as Shoemaker-Levy 9. See COMET. Interior composition and structure. Jupiter primarily consists of liquid and metallic hydrogen. Early measures of the ratios of helium, carbon, and nitrogen to hydrogen gave values resembling those of the Sun, and therefore the primordial composition of the solar system. However, later analyses of the methane spectrum showed a two- to threefold overabundance of carbon as compared to solar values, a result confirmed by gravity analyses of the rocky core. Still in a late phase of its gravitational
Fig. 5. Image of Jupiter from the Hubble Space Telescope showing impact sites of fragments D and G of Comet Shoemaker-Levy 9. The large feature with a ring and crescent-shaped ejecta was created by fragment G. The smaller feature to the left of fragment G impact site was created by fragment D. (H. Hammel and NASA)
539
540
Jupiter
Fig. 6. Schematic diagram of the interior of Jupiter.
contraction, the planet converts the released gravitational energy into heat, emitting 1.668 times as much thermal energy as it receives from the Sun. Below the clouds is a 13,000-mi-thick (21,000-km) layer of hydrogen and helium, which changes from gas to liquid as the pressure increases. Beneath the liquid hydrogen layer is a 25,000-mi-deep (40,000-km) sea of metallic hydrogen, where the pressure is approximately 3 million times that at Earth’s sea level. This layer is electrically conduc-
bow shock magnetosheath
magnetosphere
magnetopause
lo plasma torus magnetotail
magnetosphere disk
Fig. 7. Jovian magnetic environment. (From Return to Jupiter, Astronomy, 7(9):6–24, 1979)
tive and causes Jupiter’s intense magnetic field. Acoustic oscillations confirm the presence of a small high-density solid core, perhaps 1.5 times the Earth’s diameter, with a density 10–30 times that of Earth’s mean density and a temperature of 55,000◦F (30,000◦C) (Fig. 6). Differentiation occurs. The metallic envelope has a mass fraction of heavy elements that is less than 7.5 times the solar value, while this value ranges from 1 to 7.2 in the molecular envelope. The Galileo probe found that the helium-to-hydrogen ratio was 0.156 (near solar), but that carbon, sulfur, and xenon were enhanced and oxygen and neon depressed with respect to solar values. The large abundance of carbon suggests that gases other than those from the solar nebula contributed to the composition of Jovian volatile gases. Probably the entire planet holds 11–45 earth masses of elements other than hydrogen and helium. Magnetosphere. Jupiter possesses the strongest magnetic field and most complex magnetosphere of any planet in the solar system (Fig. 7). The Jovian magnetosphere is the largest object in the solar system, extending well beyond the orbit of Saturn, which sometimes passes through it. If it could be seen visually from Earth, it would subtend several times the diameter of the full moon on the sky. The Jovian magnetic field rotates with the rotational period of the planet and contains an embedded plasma trapped in the field. At the distance of the satellite Io, the field revolves faster than the satellite, and so numerous collisions occur with the atmospheric gas of that body, resulting in the stripping away of 1028– 1029 ions per second. The energy involved slows the magnetic field, and so, beyond Io, the magnetic field no longer rotates synchronously with the planet. The ions removed from Io spiral around the magnetic lines of force, oscillating above and below the plane of Io’s orbit. This ring of ions is known as the Io plasma torus and emits strongly in the ultraviolet. The motion of Io through the magnetosphere creates a 400,000-V, 2 × 1012 W circuit, sufficient to cause pronounced aurorae in both the equatorial and polar regions of the satellite. Except near the planet, the major component of the Jovian magnetic field is dipolar and opposite in direction from that of the Earth. At a distance of three Jovian radii, the field strength is 0.16 gauss (16 microtesla). Closer to the planet, the field is quadrupolar and octopolar. There the field strength varies from 3 to 14 gauss (0.3 to 1.4 millitesla). The magnetic axis is aligned about 10◦ with respect to the rotational axis. The inner radiation ring flux intensity peaks at 2.2 Jovian radii, while its innermost edge is at 1.35 Jovian radii, well above the atmospheric cut-off. Radiation is also absorbed near the outer edge of the bright dust ring. Ion sources for the magnetospheric plasma are the solar wind; the Jovian satellites, particularly Io; and the ionosphere itself. Ion temperatures in the Io plasma torus reach 17,500◦F (10,000 K). The dayside Jovian magnetosphere extends to 105 Jovian radii. On the duskside high-latitude region, intense
Jupiter fluxes of counterstreaming ions and electrons from the edge of the plasma torus to the duskside magnetosphere are aligned tightly with the magnetic field, superimposed on a hot plasma background. In addition, a thin plasma sheet (its depth at 15 Jovian radii is 2 radii) has temperatures of 100–400 eV. At polar latitudes, energetically charged particles following the Jovian magnetic lines of force may have access to the interplanetary medium. The Jovian magnetotail extends at least 4.5 astronomical units (4 × 108 mi or 6.5 × 108 km). See SOLAR WIND. Jupiter possesses the most powerful aurorae in the solar system. They are observed at soft x-ray, ultraviolet, visible, and infrared wavelengths. Energies are 1000 times higher than in terrestrial aurorae. Protons, electrons, sulfur, and oxygen ions (from Io) precipitate onto the planet near its magnetic poles. These also form H3+ (trihydrogen) ions, the first ever observed. The Galileo probes observed visible auroral rings, highly structured and rapidly varying in time, 300–5000 mi (500–8000 km) in diameter at an altitude of 150 mi (240 km) above the 1-bar pressure level. See AURORA; MAGNETOSPHERE. Radio astronomy. Jupiter produces distinct types of radio emission. Thermal radiation from the high stratosphere is detectable at wavelengths below about 10 cm and indicates temperatures in the upper emitting layers of −280 to −225◦F (100 to 130 K). Microwave nonthermal emission in the band from about 3 to 70 cm arises from synchrotron radiation from relativistic electrons in extended Jovian Van Allen belts. Decametric radio emission (of frequencies 10–23 MHz) is also formed by synchrotron emission of these electrons and is 100% elliptically polarized. It occasionally produces S-bursts, with durations of tens of milliseconds, caused by instabilities of the electron beam in the Io-Jupiter flux tube. Hectometric emission (around 500 kHz) is probably rooted in the auroral ovals. The Ulysses probe detected radio emission at 3–30 kHz. Some radio noise is thought to originate in the lightning storms first observed by Voyager 2 at depths of 50 mi (80 km) below the ammonia clouds. See POLARIZATION OF WAVES; SYNCHROTRON RADIATION; VAN ALLEN RADIATION. Subsequent observations by the Galileo space probe have shown that lightning occurs in the Jovian atmosphere at a rate of approximately 20 bolts per second, approximately one-tenth the Earth’s rate. The storms producing this lightning have been observed to be over 600 mi (1000 km) in diameter and over 45 mi (75 km) tall. Most lightning storms are concentrated between north latitudes 47 and 49◦. Periodicities in the radio-noise storms and rocking of the polarization plane of the microwave nonthermal emission led to the well-determined radio rotation period 9h55m29.7s. The difference between the radio and the various other observed Jovian rotation periods suggests that the core of Jupiter is rotating about 13 s faster than the mantle and that angular momentum may be significantly exchanged among the core, mantle, and atmosphere over periods of years. See RADIO ASTRONOMY.
Jovian ring. Voyager 1 and 2 first detected a faint ring encircling Jupiter. Later probes found that Jupiter possesses a tripartite ring system. There is a flat main ring, a lenticular toroidal halo interior to this, and a gossamer ring which is exterior to the main ring. The entire ring system begins at about 55,000 mi (92,000 km) from Jupiter’s center and extends to 150,000 mi (250,000 km). The main ring has a radial structure with dips in its apparent brightness due to the perturbations of the closer satellites. It is largely made up of 0.3–1.6-micrometer particles with human-scale lifetimes, whose source is dust and debris knocked off the Jovian moons XV Adrastea and XVI Metis by impacts with meteoroids, comets, and asteroids. Its inner and outer radii are respectively 76,900 mi (123,000 km) and 80,590 mi (128,940 km). The estimated mass of all particles forming it is 1 × 1013 kg. The toroidal halo interior to this ring was probably formed by electromagnetic forces pushing the smallest grains out of the ring plane. The exterior gossamer ring has two components, one associated with V Amalthea and the other with XIV Thebe. The Amalthea ring extends to precisely the orbital distance of the satellite and ends abruptly, while extending to one-half the thickness of the moon’s excursion off the Jovian equatorial plane. The external gossamer ring particles orbit in retrograde orbits. The Amalthea ring forms the largest and sparsest ring known in the solar system. The material appears to be a dark reddish soot, consistent with the release of small particles from the two moons. Satellites. As of October 2003, sixty-four known satellites had been discovered to orbit Jupiter, the largest number for any planet in the solar system. The four largest are I Io, II Europa, III Ganymede, and IV Callisto, discovered by Galileo in 1610 in Italy and independently by Simon Marius in Germany, who named them (though they are collectively known as the Galilean satellites) [see table]. The four Galilean satellites are of fifth and sixth stellar magnitudes and would be visible to the naked eye if they were not so close to the much brighter parent planet. They are easily visible in binoculars. The majority of the others are very faint objects only recently discovered with large telescopes. The planes of the orbits of the major satellites are inclined less than 0.5◦ to the equatorial plane of Jupiter, so that with the occasional exception of IV, they are eclipsed in Jupiter’s shadows and transit in front of its disk near conjunction. The eclipses, transits, and occultations of Jupiter’s satellites led to the discovery of the finite velocity of propagation of light by O. Roemer in 1675. Satellites VIII, IX, XI, and XII have retrograde motion. See ECLIPSE; OCCULTATION; TRANSIT (ASTRONOMY). The four Galilean satellites are orbitally connected. It is thought that Io was once much closer to Jupiter. Over the course of 100 million years, rotational energy was transferred to the satellite, causing it to move outward until it locked Europa into a 2:1 synchronous orbit. The two satellites then spiraled out as a pair and changed the eccentricity of Ganymede’s orbit. Eventually, the mutual synchronicity will begin
541
542
Jupiter
The 16 brightest satellites of Jupiter
Satellite
Diameter, mi (km)
XVI Metis XV Adrastea V Amalthea XIV Thebe I Io II Europa III Ganymede IV Callisto XIII Leda VI Himalia X Lysithea VII Elara XII Ananke XI Carme
37 × 21 (60 × 34) 12 × 9 (20 × 14) 155 × 80 (250 × 128) 72 × 52 (116 × 84) 2256 (3630) 1950 (3138) 3270 (5262) 2983 (4800) 10 (16) 116 (186) 22 (36) 47 (76) 19 (30) 25 (40)
VIII Pasiphae
31 (50)
IX Sinope
22 (36)
Mean distance from center of planet, 103 mi (103 km) 80 (128) 80 (129) 113 (181) 138 (222) 262 (422) 417 (671) 665 (1,070) 1170 (1883) 6893 (11,094) 7133 (11,480) 7282 (11,720) 7293 (11,737) 13,173 (21,200) 14,043 22,600 14,602 23,500 14,726 (23,700)
Magnitude at mean opposition
Year of discovery
Rotation∗
Mass, 1020 lb (1020 kg)
0.294 0.297 0.498 0.674 1.769 3.551 7.155 16.689 240 251 260 260 631 692
17.5 18.7 14.1 16.0 5.0 5.3 4.6 5.6 19.5 14.6 18.3 16.3 18.8 17.6
1979 1979 1892 1979 1610 1610 1610 1610 1974 1904 1938 1905 1951 1938
Sync. Sync. Sync. Sync. Sync. Sync. Sync. Sync. Unknown Nonsync. Nonsync. Nonsync. Nonsync. Unknown
0.002 (0.001) 0.0004 (0.0002) 0.16 (0.07) 0.017 (0.008) 1971 (894) 1058 (480) 3267 (1482) 2372 (1076) 0.00013 (0.00006) 0.21 (0.10) 0.0017 (0.0008) 0.017 (0.008) 0.0008 (0.0004) 0.002 (0.001)
735
17.0
1908
Unknown
0.004 (0.002)
758
18.1
1914
Nonsync.
0.0017 (0.0008)
Orbital period, days
∗ Synchronous or nonsynchronous. SOURCE: After Lunar and Planetary Laboratory, University of Arizona; and J. A. Wood, Forging the planets: The orgin of our solar system, Sky Telesc., 97(1):36–48, January 1999.
to affect the orbit of Callisto and lock it in, too. The gravitational effects of proximity of Jupiter and the slight elliptical orbits cause surface flexure and tidal heating strong enough to produce internal melting and major upheaval. All four satellites revolve within the Jovian magnetosphere, causing heavy ion bombardment of their surfaces. The Jovian satellites fall into eight groups, based on their orbital characteristics. The Small Inner Irregulars and Rings (XVI Metis, XV Adrastea, V Amaltheal, XIV Thebe) are closest to the planet and govern ring structure. Irregular refers to their origin: they are probable captures from the asteroid belt after Jupiter’s formation. The Galilean moons (as above) undoubtedly formed independently from the solar nebula or else as agglomerations of nebular material, likely at the same time as Jupiter. The Themisto (XVIII) and Himalia (VI) Prograde Irregular Groups comprise five satellites in total, all with moderately high eccentricites. They, too are likely captures and orbit the planet in prograde orbits. Three other groups (Retrograde Irregular, Carme [XI] Retrograde Irregular, and Pasiphae [VIII] Retrograde Irregular) orbit Jupiter in a retrograde direction and have moderate eccentricities. The newest discoveries have not been assigned to a group, though they share similar orbits and may be fragments of a single large body that broke up. In appearance, they are very dark and quite faint. Io. The close approaches of the Voyager and Galileo spacecraft as well as the superior imaging capabilities of the Hubble Space Telescope have shown the four Galilean satellites to be very different (Fig. 6). I Io is probably the most geologically active body in the solar system. Its surface landforms include active shield volcanoes, calderas, mountains, plateaus, flows, grabens, and scarps. There are strong concentrations of craters near the sub- and antiJovian points. High mountains are equally distributed over the surface and are not connected with volca-
noes. The tallest mountain known has an altitude of 52,000 ft (16 km). High-temperature (1700–2000 K) magnesium-rich silicate volcanism is common. Volcanism is likely due to a slight eccentricity in the orbit produced by the proximity of II Europa, which causes internal tidal forces to produce heat due to friction. Over 80 active volcanoes have been observed, though there are over 200 calderas larger than 12 mi (20 km) in diameter; by contrast, Earth has but 15. The most active volcanoes are Loki (responsible for half the satellite’s total heat output) and Pele. In 1999 a stupendous eruption was seen at Tvashtar Patera volcano. The lava fountain was observed to be over 0.9 mi (1.5 km) tall and 25 mi (40 km) long. The lava produced was hotter than 2420◦F (1600 K) and was visible to Earth-based telescopes observing in the infrared. Volcanic eruptions generally produce plumes resembling geysers containing fine dust and cold SO2 (sulfur dioxide) gas and snow with exit velocities of 500–1000 m/s. Other compounds such as diatomic sulfur (S2) are also ejected. Once on the surface, S2 changes to S3 and S4, which are bright red. Over time, these become S8 (the common yellow form of sulfur). Therefore, the most active volcanoes are surrounded by crimson rings which fade with time. Other bright colors on the surface are white and gray (SO2 frost), yellow and brown (sulfurous compounds), red and black (showing the most recent activity), and green (origin unknown). Since Io possesses an ionosphere, there are auroral glows near the volcanic plumes. It is thought that a global surface layer is deposited at a rate of 1 mm to 1 cm every year, thereby explaining the total lack of impact craters. The internal structure of Io is thought to consist of an iron and iron sulfide core, surrounded by a molten silicate interior and a 19-mi-thick-(30-km) silicate crust, able to support the mountains. Io has been found to have an intrinsic magnetic
Jupiter field of an amplitude consistent with an internally driven dynamo. Therefore, it is a magnetized, solid planet. However, the field may turn on and off over million-year time scales as the interior recycles. Despite its small size, Io also possesses a variable atmosphere consisting primarily of sulfur dioxide (with minor components of oxygen, sodium, and sulfur), which has been detected at infrared, millimeter, and ultraviolet wavelengths. It is extremely localized, covering only 10% of the satellite at any given time. Atmospheric pressure is only about 1 microbar. Recently it has been found that the proportion of chlorine in the Io atmosphere is the highest of any object in the solar system and may result from a volcanic eruption component of salt (NaCl), proceeding from underground rivers of salt water. The Jovian magnetosphere sweeps away 1000 kg of volcanic gases and surface material per second. This is the source of material in the Io plasma torus. The sodium cloud thus produced is as large as the Jovian magnetosphere itself, and has been detected not only in the Jovian vicinity but also in interplanetary space. Europa. Europa possesses the highest albedo of any object in the solar system. The satellite has a varied surface terrain, dominated by tectonic deformation, impact cratering, and the possible emplacement of ice-rich materials and perhaps liquids on the surface. The leading hemisphere is covered with water frost, while the following is darker and redder, possibly caused by the bombardment of sulfur ions from Io. The underlying bright blue plains are the basic surface feature from which the other terrain types are derived. Tectonically deformed bright plains form the mottled regions through local disruptions that allow subsurface salts and clays to well up, akin to sea-floor spreading on Earth. There are striking linear features which range from several miles to 13 mi (20 km) across and which extend for 625 mi (1000 km) or more. These may demonstrate the existence of geyserlike cryovolcanism. Initial eruptions could involve gases or liquids plus rocks. Complex ridges also exist with heights of 330–660 ft (100– 200 m), showing that parts of the icy crust have been modified by the intense faulting driven by energy from the planetary interior (Fig. 9). These ridges have a reddish coloration at their tops which may be from mineral contaminants in the ice. Finally, there are also impact craters with visible ejecta, though the centers are curiously muted, as though filled immediately with a slushy material. The age of the visible surface has been estimated to be less than 100 million years, due to the tectonic activity. The changing pattern of ridges is consistent with a nonsynchronous rotation of an ice shell with respect to a synchronously rotating interior. Dark patches may show venting of salt-laden water from below the ice. The core of Europa is thought to be iron-rich and is 781 mi (1250 km) in diameter. It is surrounded by a silicate mantle overlain by a subsurface ocean and a thin ice crust. The total ice shell thickness may be 6–94 mi (10–150 km), but it may thin to less than 0.6 mi (1 km) in some places. This is shown by visible
(a)
(b) 0
(c)
543
1000 2000 3000 4000 5000 km
(d)
Fig. 8. Voyager 1 photographs of the Galilean satellites, shown to scale. (a) Io. (b) Europa. (c) Ganymede. (d) Callisto. 1 km = 0.6 mi. (NASA)
“rafting”: floating icebergs. That the salt-water ocean of Europa is still liquid below the surface is suggested by the observation that the slight magnetic field observed by Galileo changes direction to follow the Jovian magnetosphere. This could occur only in an induced field carried by a salt-rich true liquid layer. Europa also possesses a tenuous oxygen (O2) atmosphere. It is not inconceivable that simple life could exist in the Europan oceans, much the same way as life-forms congregate near thermal vents in Earth’s
Fig. 9. Intricately textured ridged plains on Europa, photographed by Galileo on December 16, 1997, from a distance of 800 mi (1300 km). The region covered is about 12 mi (20 km) on a side, with a resolution of only 85 ft (26 m) per picture element. Multiple generations of overlapping ridges are visible, many of them double. (Jet Propulsion Laboratory)
544
Jurassic oceans. This possibility led Galileo project managers to choose to destroy the probe in 2003 by allowing it to fall into the Jovian atmosphere, rather than to contaminate a possible European ecosystem with Earth microbes. Ganymede. The density of impact craters on the surface of III Ganymede shows that it is several billion years old. There are two major types of terrain: older, heavily fractured dark and rocky areas and brighter, less densely cratered areas rich in water and dry ice. Crossing the dark terrain are 3–6 mi-wide (5–10 km) furrows which may have formed from impacts which released water ice from below in a form of cryovolcanism. Dark maerial on the surface is likely meteoritic dust. Water and dry ice frost many of the impact craters. Fifty percent of the surface is made up of grooved terrain caused by tectonic stretching of the moon’s surface into grabens. Some of the lowlying valleys were flooded with slush about 1 billion years ago, leading to some thought that a subsurface ocean may exist here, also. The partially molten iron core of Ganymede may take up to 50% of its diameter. This is surrounded by a silicate lower mantle, an icy upper mantle, and a thin ice crust. The satellite possesses a magnetic field with a strength 1% that of Earth’s. It may possibly form in a salty, electrically conducting layer of water some 110 mi (170 km) below the surface. Ganymede possesses a slight atomic hydrogen and O2-O3 (ozone) atmosphere both above the surface and trapped within the surface ice. In fact, ozone is surprisingly abundant near the poles, while the O2 is found near the equator. There is faint fluorescence of oxygen in the ultraviolet at the poles, forming an aurora, while there is a visible light aurora at the equator. A dusting of frost near the poles may derive from the manner in which Ganymede’s magnetic field interacts with Jupiter’s. Ions and electrons preferentially cascade over the polar regions, which may cause sublimation of water which then accumulates as piles of frost. Callisto. Since it is similar in radius and density, IV Callisto should have a geological history resembling that of Ganymede, but does not. It is the most globally heavily cratered of the Galilean satellites, with large impact craters (for example Valhalla, with a diameter of 375 mi [600 km]) formed early in its history. The heavy cratering is indicative of its great age, but there is a curious lack of small craters, suggesting that there is ice-rich debris that subsumes them. Sublimation erosion by the Sun is the dominant process, causing crater rims and ridges to become heavily eroded. A long chain of craters is visible which was likely caused by the impact of a comet akin to Shoemaker-Levy 9. The centers of impact craters are often very bright, suggesting frost deposits. There is definite SO2 frost concentrated on the forward-facing hemisphere. The impact craters can be muted, as if water ice slush filled them after meteoritic impact. There are few signs of internal geology. The crust may be 124 mi (200 km) thick, with a salty ocean 6 mi (10 km) thick beneath. Below this is compressed rock and ice, most forming an
undifferentiated core. The liquid ocean is suggested by magnetic irregularities observed by Galileo; there is no intrinsic magnetic field, nor is there an atmosphere. This curious internal structure shows that Callisto formed in a cold environment, unlike the other three Galilean satellites which achieved differentiated cores by forming in a hot region. Other satellites. The Galileo spacecraft was the first to image the four inner satellites of Jupiter, XVI Metis, XV Adrastea, V Amalthea, and XIV Thebe. All are 25– 35% brighter on their leading sides than on their trailing sides. Metis and Adrastea are redder in coloration. Bright spots were seen on Amalthea and Thebe (diameters less than 13 mi [20 km]), as well as what appeared to be higher albedo patches on ridges and crater rims. These probably represent deposition of material from Io. The major geologic features on all four inner satellites are impact craters. The largest is 41 mi (60 km) across on Amalthea. Since its density is too low to allow it to be a solid object, Amalthea is almost certainly a loose agglomeration, held together by gravity. If it ever was solid, it may have been shattered into sand and boulder-sized rubble by impacts. VI Himalia has been shown from photometry to consist of carbonaceous chondritic material, and probably the remainder of the satellites of Jupiter are constructed of similar material. Satellites VII–XII are asteroidal in nature. They resemble the Trojan asteroids in their colors and albedos, and may share a common origin. They are less elliptical in shape than the other small Jovian satellites, perhaps because they spent more time in the proto-Jupiter nebula. However, their properties are also consistent with a capture origin, and their orbits are subject to large perturbations by the Sun. Thus they may form part of a fluctuating population gained and lost over very long time spans. See ASTEROID; PLANETARY PHYSICS; TROJAN ASTEROIDS. Elaine M. Halbedel Bibliography. J. K. Beatty, Galileo: An image gallery II, Sky Telesc., 93(3):28–35, March 1997; J. K. Beatty and D. H. Levy, Crashes to ashes: A comet’s demise, Sky Telesc., 90(4):18–26, October 1995; J. K. Beatty, C. C. Petersen, and A. Chaikin (eds.), The New Solar System, 4th ed., 1998; M. J. S. Belton (ed.), In Orbit at Jupiter: Contributions of the Galileo Imaging Science Team, NASA Galileo Proj. Doc. 625–801, 2000; M. J. S. Belton et al., Galileo’s first images of Jupiter and the galilean satellites, Science, 274:377– 385, 1996; D. Fischer, Mission Jupiter: The Spectacular Journey of the Galileo Spacecraft, Copernicus Books, 2001; M. Hanlon, The Worlds of Galileo: The Inside Story of NASA’s Mission to Jupiter, St. Martin’s Press, 2001; D. M. Harland, Jupiter Odyssey: The Story of NASA’s Galileo Mission, Springer-Praxis, 2000.
Jurassic The system of rocks deposited during the middle part of the Mesozoic Era, and encompassing an interval of time between about 200 and 142 million
Jurassic
CENOZOIC
QUATERNARY TERTIARY
CRETACEOUS
MESOZOIC
JURASSIC
TRIASSIC PERMIAN PENNSYLVANIAN CARBONIFEROUS
PALEOZOIC
MISSISSIPPIAN
DEVONIAN SILURIAN ORDOVICIAN
CAMBRIAN PRECAMBRIAN
years ago, based on radiometric dating. It takes its name from the Jura Mountains of Switzerland. Its rich marine invertebrate faunas in western Europe have been the subject of intensive study since the pioneering days of geology in the early nineteenth century, and provided the basis for the fundamental stratigraphic concepts of stages and biozones. See DATING METHODS. Subdivisions. The Jurassic System is subdivided into 11 stages which, with the exception of the Tithonian, are named from localities in England, France, and Germany (Fig. 1). These, and the much greater number of zones, are based upon ammonites, which are by far the most valuable fossils biostratigraphi-
SUBSYSTEM
STAGE Tithonian
UPPER
Kimmeridgian Oxfordian Callovian
MIDDLE
Bathonian Bajocian Aalenian Toarcian
LOWER
Pliensbachian Sinemurian Hettangian
AGE, 10 6 years 142 149 155 157 160 166 174 178 183 192 197 200
Fig. 1. Succession of Jurassic stages, with estimated radiometric ages in millions of years. (After J. Palfy et al., A U-Pb and 40Ar/39Ar time scale for the Jurassic, Can. J. Earth Sci., 37:923–944, 2000)
cally because of their high rate of species turnover in time due to rapid evolution and extinction. The most refined stratigraphic subdivisions have been made in the British Isles, with 54 zones and 176 subzones. Because of biogeographic provinciality, with different ammonite taxa inhabiting Boreal and Tethyan realms, difficulties of correlation can occur for younger Jurassic strata; and the youngest stage in the Tethyan Realm, the Tithonian, embracing most of the world, is equivalent to the Volgian stage of the Boreal Realm, extending from northern Eurasia to northern North America. The ammonites of the Volgian are quite different from those of the stratigraphically equivalent Tithonian. In the absence of ammonites, dinoflagellates are the most useful marine fossils for correlation, but in nonmarine strata problems of correlation are considerable, and stratigraphically less satisfactory pollen and spores have to be used. See STRATIGRAPHY. Paleogeography and sea level. The main continental masses were grouped together as the supercontinent Pangaea, with a northern component, Laurasia, separated from a southern component, Gondwana, by a major seaway, Tethys, which expanded in width eastward (Fig. 2). From about Middle Jurassic times onward, this supercontinent began to split up, with a narrow ocean being created between eastern North America and northwestern Africa, corresponding to the central sector of the present Atlantic Ocean. At about the same time, and continuing into the Late Jurassic, separation began between the continents that now surround the Indian Ocean, namely Africa, India, Australia, and Antarctica. As North America moved westward, it collided with a number of oceanic islands in the eastern part of the PaleoPacific. Because the impingement was an oblique one, there was a general tendency for these accreted landmasses to be displaced northward along the cordilleran zone of the subcontinent. Other examples of so-called displaced terranes are known on the Asian side of the North Pacific, and some of the accretion of oceanic islands took place in Jurassic times. There were also important paleogeographic changes later in the period involving the Tethys zone. An older, so-called Palaeotethys was progressively closed as an extensive, narrow continent known as Cimmeria, extending east-west, and collided with the southern margin of Eurasia. The name comes from the Crimean Peninsula of Russia, where there is well-displayed evidence of an intra-Jurassic orogenic disturbance indicative of continental collision. See OROGENY; PALEOGEOGRAPHY. Sea level rose progressively through the period, with a corresponding flooding of the continents by shallow epeiric seas, that is, shallow seas that covered part of the continents but remained connected to the ocean. At the beginning, such seas covered less than 5% of the continents, but near the end, in Oxfordian and Kimmeridgian times, they covered approximately 25% (Fig. 2). The Jurassic sea-level curve also shows a succession of smaller-scale changes, of a duration of a few million years. Some of these, such
545
546
Jurassic
60°N Western Laurasia
Eastern Laurasia
30°
0° Tethys 30°
Gondwana
60°S
Gondwana
Fig. 2. Approximate distribution of land and sea in the Oxfordian stage. Small islands are excluded, but boundaries of modern continents are included as a reference.
as the Early Toarcian sea-level rise, are clearly global or eustatic, but others are more controversial and may reflect regional tectonic activity rather than truly global phenomena. It is uncertain by how much the sea level rose during the course of the period; but by using a hypsometric method, an estimate of between 330 and 500 ft (100 and 150 m) can be made. See PALEOCEANOGRAPHY. Climate. The climate of Jurassic times was clearly more equable than at present, as indicated by two sets of facts. The first concerns the distribution of fossil organisms. Thus a number of ferns whose living relatives cannot tolerate frost are distributed over a wide range of paleolatitudes, sometimes as far as 60◦ N and S. Similarly, coral reefs, which are at present confined to the tropics, occur in Jurassic strata in western and central Europe, beyond the paleotropical zone. Many other groups of organisms had wide latitudinal distribution, and there was much less endemism (restriction to a particular area) with respect to latitude than there is today. The second set of facts concerns the lack of evidence for polar icecaps, such as extensive tillites or striated pavements. However, there must have been strong seasonal contrasts of temperature within the Pangean supercontinent, and climatic modeling suggests winter temperatures at zero Celsius at or close to the paleopoles. A limited amount of evidence from northern Siberia and arctic North America, in the form of apparent glacial dropstones and glendonites, suggests the possibility of some ice, but this ice is likely to have been seasonally transient and small in volume. There is no evidence of any significant change in the temperature regime through the Jurassic, but there are indications of a change in the humidity-
aridity spectrum. Unlike the present, there were no tropical rainforests. Instead, a large area of western Pangea experienced an arid to semiarid climate in low latitudes, especially at some distance from the ocean. Precipitation is likely to have been dominantly monsoonal rather than zonal, a pattern unlike that of today. For most of the period, the continental area represented today by Eurasia had a comparatively humid climate, as indicated in nonmarine sediments by coals and the abundance of the clay mineral kaolinite. Toward the end of the Jurassic, however, there was a change to a more arid climate, indicated by the disappearance of coals and kaolinite and the occurrence of evaporites such as rock salt and gypsum. The reason for this change is unclear, but it may be bound up with a rainshadow effect created by the collision of the Cimmerian continent. See PALEOCLIMATOLOGY; SALINE EVAPORITES. Tectonics and volcanicity. Most of Pangea experienced tensional tectonics as the supercontinent began to break up. This is manifested by graben and half-graben structures, with associated alkaline volcanicity. By far the largest flood basalt province is that of the Karoo in South Africa, most of the basalts and associated igneous rocks being erupted in the Early Jurassic, prior to the breakup of Africa, Madagascar, and India. The Middle Jurassic Ferrar dolerites of Victoria Land, Antarctica, and the contemporaneous Tasmanian dolerites are further manifestations of tensional tectonics, as are earliest Jurassic basalts in eastern North America and Morocco, again signifying tension prior to the Atlantic opening. The North Sea region of western Europe is in effect an aborted oceanic rift, with a major phase of tensional activity and associated volcanicity in the Middle and Late
Jurassic Jurassic. This did not lead, however, to the creation of true ocean. See BASALT; GRABEN. Compressional tectonics associated with subduction of ocean floor took place in many parts of the Pacific margins, with associated calc-alkaline volcanicity. An excellent example is the Andes. The North Pacific margins were also associated with significant strike-slip faulting bound up with the accretion of displaced terranes. The other important zone of compressional tectonics was along the southern margin of Eurasia, and is involved with the collision of the Cimmerian continent. See FAULT AND FAULT STRUCTURES. Since it is not plausible to invoke the melting and freezing of polar ice caps to account for Jurassic sealevel change, this change must be bound up with tectonic activity. The most plausible mechanism for accounting for long-term sea-level rise is the growth of oceanic ridges, displacing seawater onto the continents, but the cause of short-term sea-level changes is more obscure and remains controversial. See CONTINENTS, EVOLUTION OF; GEOSYNCLINE; MID-OCEANIC RIDGE; PLATE TECTONICS; SUBDUCTION ZONES. Vertebrate fauna. The vertebrate terrestrial life of the Jurassic Period was dominated by the reptiles. The dinosaurs had first appeared late in the Triassic from a thecodont stock, which also gave rise to pterosaurs and, later, birds. From small bipedal animals such as Coelophysis, there evolved huge, spectacular creatures. These include the herbivorous Apatosaurus, Brontosaurus, Brachiosaurus, Diplodocus, and Stegosaurus as well as the carnivorous, bipedal Allosaurus. Only two rich dinosaur faunas are known from Jurassic deposits, the Morrison Formation of the United States Western Interior and the approximately contemporary Tendaguru Beds of Tanzania. The two faunas are strikingly similar at family and generic level, which strongly suggests that free land communications existed between western North America and East Africa until quite late in the period, a fact that is not easy to reconcile with some paleogeographic reconstructions. See DINOSAUR. Flying animals include the truly reptilian pterosaurs and the first animals that could be called birds as distinct from reptiles, as represented by the pigeon-sized Archaeopteryx. There were two important groups of reptiles that lived in the sea, the dolphinlike ichthyosaurs and the long-necked plesiosaurs. Both of these groups had streamlined bodies and limbs beautifully adapted to marine life. Turtles and crocodiles are also found as fossils in Jurassic deposits. See ARCHAEORNITHES; PTEROSAURIA. Jurassic mammals, known mainly from their teeth alone, were small and obviously did not compete directly with the dinosaurs. They included a number of biologically primitive groups such as the triconodonts, docodonts and multituberculates. The fish faunas were dominated by the holosteans, characterized by heavy rhombic scales. Their evolutionary successors, the teleosts, probably appeared shortly before the end of the period. See DOCODONTA; HOLOSTEI; MULTITUBERCULATA; TELEOSTEI; TRICONODONTA.
Invertebrate fauna. Because they are far more abundant, the invertebrate fossil faunas of the sea are of more importance to stratigraphers and paleoecologists than are the vertebrates. By far the most useful for stratigraphic correlation are the ammonites, a group of fossil mollusks related to squids. They were swimmers that lived in the open sea, only rarely braving the fluctuating salinity and temperature of inshore waters. They are characteristically more abundant in marine shales and associated finegrained limestones. From a solitary family that recovered from near extinction at the close of the Triassic, there radiated an enormous diversity of genera. Many of these were worldwide in distribution, but increasingly throughout the period these was a geographic differentiation into two major realms. The Boreal Realm occupied a northern region embracing the Arctic, northern Europe, and northern North America. The Tethyan Realm, with more diverse faunas, occupied the rest of the world. See LIMESTONE; SHALE. In most facies the bivalves, which flourished in and on shallow, muddy sea bottoms, are the most abundant and diverse of the macrofauna. They included many cemented forms such as Ostrea, recliners such as Gryphaea, swimmers such as the pectinids and limids, and rock borers such as Lithophaga. However, the majority were burrowers: either relatively mobile, shallow burrowers or forms occupying deep permanent burrows and normally still found in their positions of growth. See BIVALVIA; FACIES (GEOLOGY). Brachiopods were much more abundant and diverse than they are today. The range of depths below the sea surface that they occupied is far wider than for the bivalves, and a definite depth zonation can be established in Europe, just as with the ammonites. See BRACHIOPODA. Echinoderms are best represented as fossils by the crinoids and echinoids, and were all inhabitants of shallow seas, unlike some of the modern representatives of this class. The echinoids include both primitive regular forms, such as the cidaroids, and irregular forms, such as Clypeus and Pygaster. See ECHINODERMATA; PYGASTEROIDA. Corals belonged to the still extant Scleractinia group and included reef builders such as Isastrea and Thamnasteria. Calcareous and siliceous sponges are also common locally, even forming reefs. It seems likely that the siliceous sponges inhabited somewhat deeper water than the corals. See SCLERACTINIA; SCLEROSPONGIAE. The invertebrate microfaunas are represented by abundant foraminifera, ostracods, and radiolaria. Foraminifera and ostracods are of great value to oil companies in correlation studies. See OSTRACODA; RADIOLARIA. Not all Jurassic invertebrates lived in the sea. Some lived in continental environments such as lakes and rivers; they include a few genera of bivalves, gastropods, and arthropods. These faunas are far less diverse than their marine counterparts. See ARTHROPODA; GASTROPODA; PALEONTOLOGY.
547
548
Jute Flora. With regard to the plant kingdom, the Jurassic might well be called the age of gymnosperms, the nonflowering “naked seed” plants, forests of which covered much of the land. They included the conifers, gingkos, and their relatives, the cycads. Ferns and horsetails made up much of the remainder of the land flora. These and others of the Jurassic flora are still extant in much the same forms. See CYCADALES; GINKGOALES. Remains of calcareous algae are widely preserved in limestone. Besides the laminated sedimentary structures produced by what have traditionally been regarded as blue-green algae but are actually cyanobacteria, and known as oncolites and stromatolites, there are skeletal secretions of other groups. Some of these are benthic forms, but many pelagic limestones are seen under the electron microscope to be composed largely of tiny plates of calcite, known as coccoliths, which are secreted by certain planktonic algae also called coccoliths. See ALGAE; CYANOBACTERIA; STROMATOLITE. It seems likely that the Late Jurassic saw the emergence of the flowering plants, the angiosperms, since well-developed forms of this group existed in the Early Cretaceous. However, it is not quite understood how they emerged, and a satisfactory direct evolutionary ancestor has yet to be identified with certainty. Economic geology. Jurassic source rocks in the form of organic-rich marine shale and associated rocks contain a significant proportion of the world’s petroleum reserves. A familiar example is the Upper Jurassic Kimmeridge Clay of the North Sea, and its stratigraphic equivalents in western Siberia. Some of the source rocks of the greatest petroleum field of all, in the Middle East, are also of Late Jurassic age. See MESOZOIC; PETROLEUM GEOLOGY. A. Hallam Bibliography. W. J. Arkell, Jurassic Geology of the World, 1956; J. W. C. Cope et al. Jurassic, pts. 1 and 2, Geol. Soc. Lond. Spec. Rep. 14 and 15, 1980; A. Hallam, Jurassic climates as inferred from the sedimentary and fossil record, Phil. Trans. Roy. Soc. Lond., B341:287–296, 1993; A. Hallam, Jurassic Environments, 1975; A. Hallam, A review of the broad pattern of Jurassic sea-level changes and their possible causes in the light of current knowledge, Palaeogeog., Palaeoclimatol., Palaeoecol., 167:23– 37, 2001.
Jute A natural fiber obtained from two Asiatic species, Corchorus capsularis and C. olitorius, of the plant family Tiliaceae (Fig. 1). These are tall, slender, halfshrubby annuals, 8–12 ft (2.5–3.5 m) tall. See MALVALES. When harvested the stems are retted in tanks or pools to rot out the softer tissues. The strands of jute fiber are then loosened by beating the stems on the surface of the water. The fibers are not very strong and deteriorate quickly in the presence of moisture, especially salt water. Despite these weak-
Fig. 1. Morphological features of Corchorus capsularis.
nesses, jute is much used. It is inexpensive and easily spun and converted into coarse fabrics. It is made into gunny, burlap bags, sacks for wool, potato sacks, covers for cotton bales, twine, carpets, rug cushions, curtains, and a linoleum base. It is also used in making coarse, cheap fabrics, such as novelty dress goods. Most of the commercial supply comes from plants grown in the Ganges and Brahmaputra valleys in Bangladesh and India. See FIBER CROPS; NATURAL FIBER. Elton G. Nelson A number of diseases that affect jute cause losses in yield and reduce fiber quality. “Runner” and “specky” fiber are primarily due to disease-producing organisms. The fungus Macrophomina phaseolina is believed to cause the most serious disease of the two species of jute. It is seed-borne and soil-borne, and pycnidiospores from susceptible plants besides jute also serve as sources of infection. The stem, leaves, and roots of both young and older plants are subject to attack. Stem infection usually takes place through a leaf petiole or at a node (Fig. 2). Root rot is complicated in that severity is increased when M. phaseolina is in combination with other fungi, bacteria, or nematodes, such as Fusarium solani, Pseudomonas sp., and Meloidogyne incognita, respectively. See AGONOMYCETES; LEAF; NEMATA (NEMATODA); ROOT (BOTANY); SEED; STEM. In contrast to Macrophomina phaseolina,
Jute
Fig. 2. Disease lesions on jute stems.
Colletotrichum capsici causes lesions on the stem internodes and may also attack seedlings and capsules of C. capsularis. Macrophoma corchori and Diplodia corchori cause stem diseases. Two species of
bacteria, Xanthomonas makatae and X. makatae var. olitorii, attack the stem and leaves of both C. capsularis and C. olitorius. Pythium splendens causes a root rot and subsequent wilt of C. capsularis, and indications are that other species of Pythium also are root pathogens of jute. See FRUIT. Other fungi which attack jute are Sclerotium rolfsii, Curvularia subulata, Cercospora corchori, Rhizoctonia solani, Helminthosporium sp., and Alternaria sp. Seed treatments with approved fungicides are recommended for control of seed-borne pathogens and seedling diseases. Stem rot may be prevented by spraying with fixed copper compounds. The excessive use of nitrogenous fertilizers increases the incidence of stem diseases. Root-rot control requires the use of crop rotation and, in some areas, the use of recently developed varieties of C. capsularis which are more tolerant of certain root-rot pathogens than C. olitorius. See PLANT PATHOLOGY. Thomas E. Summers
549
K
Kalahari Desert — Kyanite
30°E
e
MOZAMBIQUE
ZIMBABW BOTSWANA ZIMBABWE
BOTSWAN
ININ
KALAHARI KALAHAR imppooppoo Li m DESERT Gaboron Gaborone e Pretori Pretoria Maputo I DESERT e Johannesburg ⁄ e e SWAZILAN T AT E A R GRE AS Mbabane SWAZILAND G l R a KA TSS. . Va LESOTHO LESOTH MMT OOrarannggee Maseru e E⁄ Durban AK Durba SOUTH SOUTH DR AFRICA AFRIC
e
Windhoek Windhoe
NS
30°S
BE
RG
Walvis Bay ⁄ Walvis
Victoria Victoria Falls
15°S
ZZa ammb beez i
C
Harare Harar
k
NAMIBIA NAMIBI
MALAWI MALAW
eLilongw Lilongwe
U
ue
i
O
m ezb i ez
M U M
zi
Okaa v v aannggoo
ZAMBIA ZAMBI Lusaka e Lusak
Nyas
Mo zam bi q
KuKnuenneen e
ab a mZ
o nddo uaan KCw
15°S
Z
ANGOLA ANGOL
MMOOZZAAMMBBIQ IQUUE E PPL LAA
PLATEA
CH CINH GINA GM A TM ST. S
15°E
TSERT EDRE DES MIBI B NNAM
A flat, largely waterless, sparsely populated, and sand-covered region in south-central Africa which occupies much of Botswana, plus adjacent parts of South Africa and Namibia. The precise boundaries of the Kalahari are difficult to define since it merges into humid regions to the north and east, and the arid Karoo and Namib further south and west. However, the generally agreed-upon boundaries are the Okavango-Zambezi “swamp zone” in the north and the Orange River in the south, with the western and eastern boundaries (in eastern Namibia and eastern Botswana, respectively) coinciding with the extent of the sandy Kalahari group sediments (see illustration). See AFRICA. Climate in the Kalahari varies from arid to semiarid along a southwest–northeast gradient. The driest, southwestern areas (which can be considered true desert) receive as little as 150 mm (6 in.) of rainfall per year, with precipitation levels rising to the north and east to reach around 650 mm (26 in.) in northeast Botswana. However, much of this rain falls during the Southern Hemisphere summer months when high evapotranspiration rates also occur (exceeding 4000 mm or 157 in. per year in the southwest Kalahari), contributing to the desertlike conditions. See DESERT. The Kalahari terrain is relatively flat, broken only by low hills along its periphery and dunefields in the extreme southwest. Permanent drainage is restricted to the north where the Chobe, Zambezi, and Okavango rivers dissect the landscape, the latter ending abruptly in the Okavango Delta swamps. Water is also present on a seasonal basis within the salt lakes of the Makgadikgadi and Etosha basins of central Botswana and northern Namibia, respectively, and within numerous smaller landscape depressions (called pans). In addition to rivers and lake basins, extensive networks of “fossil” dry valleys also cut across the region, suggesting that wetter climates may have occurred in the recent geologic past. See DUNE.
Soils developed within the sandy Kalahari sediments are typically skeletal (weakly bound) and have a relatively low nutrient status, although finertextured soils may be found in association with rivers, pans, dry valleys, and interdune areas. Vegetation patterns broadly follow the climatic gradient, in terms of diversity and biomass, although, again, local variations occur near drainage features. Much of the region is covered by savanna communities, with shrub savanna prevalent in the southwest and tree savanna in the central and northern Kalahari. Exceptions occur in the Okavango swamps, where aquatic grasslands and riparian forests dominate, around saline lake complexes where savanna grasslands are common, and in the relatively wet north and east where dry deciduous forests are the
L
Kalahari Desert
Cape Cape Town
e
Cape of of Cape Good Good Hope
15°E Map of the Kalahari Desert.
GREAT K
ARRO ⁄
O
Port Elizabeth
0 0
30°E
500 mi 500 km
30°S
552
Kale dominant vegetation type. See PLANT GEOGRAPHY; SAVANNA; SOIL. The Kalahari has supported, and in some areas still supports, large populations of migratory wild herbivores, including antelope, zebra, and giraffe, along with associated predators, including lion, leopard, cheetah, and hyena. Pressures from hunting and increasing numbers of domestic herbivores, particularly cattle (with the associated fenced-in land), have meant that wildlife populations have been increasingly restricted to protected areas. At the same time, human populations are growing, most notably along the eastern and southern desert periphery (such as Gaborone, Botswana and Upington, South Africa), as well as within the desert core in conjunction with tourism (such as Maun, Botswana) and diamond mining (such as Orapa and Jwaneng, Botswana) developments. David J. Nash Bibliography. A. S. Goudie, Great Warm Deserts of the World, Oxford University Press, 2002; D. Sporton and D. S. G. Thomas (eds.), Sustainable Livelihoods in Kalahari Environments: Contributions to Global Debates, Oxford University Press, 2002; D. S. G. Thomas and P. A. Shaw, The Kalahari Environment, Cambridge University Press, 1991; L. van der Post, The Lost World of the Kalahari, Penguin, London, 1962; E. N. Wilmsen, Land Filled with Flies: A Political Economy of the Kalahari, University of Chicago Press, 1989.
the United States. Cultural practices are similar to those used for cabbage, but kale is more sensitive to high temperatures. Strains of the Scotch and Siberian varieties are most popular. Kale is moderately tolerant of acid soils. Monthly mean temperatures below 70◦F (21◦C) favor best growth. Harvesting is generally 2–3 months after planting. Virginia is an important producing state. See BROCCOLI; BRUSSELS SPROUTS; CABBAGE; CAPPARALES; COLLARD; KOHLRABI. H. John Carew
Kaliophilite A rare mineral tectosilicate found in volcanic rocks high in potassium and low in silica. Kaliophilite is one of three polymorphic forms of KAlSiO4; the others are the rare mineral kalsilite and an orthorhombic phase formed artificially at about 930◦F (500◦C). It crystallizes in the hexagonal system in prismatic crystals with poor prismatic and basal cleavage. The hardness is 6 on Mohs scale, and the specific gravity is 2.61. At high temperatures a complete solid-solution series exists between KAlSiO4 and NaAlSiO4, but at low temperatures the series is incomplete. The principal occurrence of kaliophilite is at Monte Somma, Italy. See SILICATE MINERALS. Cornelius S. Hurlbut, Jr.
Kalsilite Kale Either of two cool-season biennial crucifers, Brassica oleracea var. acephala and B. fimbriata, of Mediterranean origin. Kale belongs to the plant order Capparales, and the plant family Brassicaceae, commonly known as the mustard family. Cabbage, kohlrabi, broccoli, collards, and Brussels sprouts are all variants of B. oleracea Kale is grown for its nutritious green curled leaves which are cooked as a vegetable (see illus.). Distinct varieties (cultivars) are produced in Europe for stock feed. Kale and collards differ only in the form of their leaves; both are minor vegetables in
A rare mineral described in 1942 from volcanic rocks at Mafuru, in southwest Uganda. Kalsilite has since been synthesized. It is one of the three polymorphic forms of KAlSiO4; the others are kaliophilite and an orthorhombic phase formed artificially at about 930◦F (500◦C). The mineral as shown by x-ray photographs is hexagonal. The specific gravity is 2.59. In index of refraction and general appearance in thin section it resembles nepheline and is difficult to distinguish from it. Structurally the two minerals are similar but belong to different crystal classes. The rock in which kalsilite was found has a darkcolored, fine-grained matrix with large olivine crystals and greenish-yellow patches. These patches are intimate mixtures of diopside, calcite, and kalsilite. See FELDSPATHOID. Cornelius S. Hurlbut, Jr.
Kangaroo
Kale (Brassica oleracea var. acephala), cultivar Vates. (Joseph Harris Co., Rochester, New York)
Common name for a member of the family Macropodidae, the largest living marsupials. The 61 species in this family also include the wallabies, wallaroos, quokka, and pademelons. In Australia, the term kangaroo is applied only to the largest species of kangaroos, while all small and medium-sized grounddwelling kangaroos are called wallabies. Except for their size, wallabies are practically the same as kangaroos in appearance as well as habits. See AUSTRALIA; MARSUPIALIA. Description. There are many different kinds of kangaroos, ranging from the familiar great kangaroos
Kaolinite
Red kangaroo (Macropus rufus). (Photo by Gerald and Buff c California Academy of Sciences) Corsi;
(Macropus) to the musky rat kangaroo (Potoroidae), which is roughly the size of a rabbit but more like a rat in appearance. There are three species of great kangaroos, all of which have extremely long, powerful, highly modified hindlimbs and relatively small, unspecialized forelimbs. The long, thick, muscular tail serves as a balancing organ and also adds impetus to the leaps. Large kangaroos can sustain hopping speeds faster than 55 km/h (35 mi/h). The western gray kangaroo or Forester (M. fuliginosus) has a brownish coat with black hands and feet. The eastern gray kangaroo (M. giganteus) is the only large kangaroo in Tasmania. It is more reddish-brown than the western gray kangaroo and has coarser and longer fur. The red kangaroo (M. rufus) [see illustration] is the largest and most gregarious kangaroo. It is richly colored, powerful, and graceful. The male is a brilliant wine-red. The female, or doe, is more slender than the male, is more lightly built, and has great speed. A smoky-blue in color, the female has been named the blue flyer. Adult males, or boomers, may have a head and body length of 1300–1600 mm (50–62 in.) and a tail length of 1000–1200 mm (39–46 in.). They may weigh 20–90 kg (43–197 lb) but seldom exceed 55 kg (120 lb). Females are smaller. Kangaroos and their relatives primarily inhabit the prairies, open forest, and brush country in Australia but also are found in Tasmania and New Guinea. They feed mostly on plant foods including grasses, forbs (herbs other than grass), leaves, seeds, fruits, and bulbs. The dental formula is I 3/1, C O/O, Pm 2/2, M 4/4 for a total of 32 teeth. Following a gestation of 30–39 days, a single young (joey) is born. A newborn macropod is only 5–15 mm (0.2–0.6 in.) long with undeveloped eyes, hindlimbs, and tail. It
uses its strong, well-developed forelimbs to crawl up and into its mother’s pouch (marsupium), where it attaches to a teat. Depending on the species, the joey may remain completely in the pouch for 60– 120 days. Following the pouch period, it will venture out to find food but will regularly return to the pouch, especially in times of danger. This period, which may last for several weeks to several months, is known as the young-at-foot period. When entering the pouch, the joey enters headfirst and then turns around and pokes its head back out. The joey is forced from the pouch just before the birth of the next young, but it can still put its head back into the pouch and suckle from the teat. The female is capable of producing different qualities of milk from the two teats—a feat achieved by having the mammary glands under separate hormonal control. Longevity in the wild is normally 12–18 years. Although great numbers of red and gray kangaroos are shot each year because they are considered pests of crops and pastures, the chief natural predator is the dingo (Canis lupus familiaris), the wild Australian dog. Tree kangaroos. Tree kangaroos (Dendrolagus) are arboreal. They still retain the long hindlimbs for leaping, but the limbs have become shorter and broader, and the foot pads are rough to assist in climbing. They have a head and body length of 520–810 mm (20– 32 in.) and a tail length of 408–935 mm (16–36 in.) They weigh 6.5–14.5 kg (14–32 lb). The pelage is usually long and may be blackish, brownish, grayish, or reddish. One species has a bright yellow face, belly, feet, and tail. The forelimbs and hindlimbs are nearly equal in size. The long, well-furred tail serves to help balance and steer the kangaroo as it makes long, flying leaps in the trees. The tail is also used to brace the animal when climbing. The 10 species live primarily in mountainous rainforest in Queensland (Australia) and New Guinea. They are very agile and are active both day and night. Food consists chiefly of tree leaves, ferns, and many kinds of fruits. Some species spend a considerable amount of time on the ground, whereas others spend almost all of their time in the trees. Clearing of rainforest and hunting are reducing their ranges. Habitat fragmentation may prevent dispersal and increase inbreeding. Three species are currently classified as endangered by the International Union for the Conservation of Nature and Natural Resources (IUCN), two are classified as vulnerable, and two as near-threatened. Donald W. Linzey Bibliography. B. Grzimek (ed.), Grzimek’s Encyclopedia of Mammals, McGraw-Hill, 1990; D. Macdonald, The Encyclopedia of Mammals, Andromeda Oxford, 2001; R. N. Nowak, Walker’s Mammals of the World, 6th ed., Johns Hopkins University Press, 1999.
Kaolinite A common hydrous aluminum silicate mineral found in sediments, soils, hydrothermal deposits, and sedimentary rocks. It is a member of a group of clay
553
554
Kaolinite minerals called the kaolin group minerals, which ˚ and 10 A ˚ ), nacrite, include dickite, halloysite (7 A ordered kaolinite, and disordered kaolinite. These minerals have a theoretical chemical composition of 39.8% alumina, 46.3% silica, and 13.9% water [Al2Si2O5(OH)4], and they generally do not deviate from this ideal composition. They are sheet silicates comprising a single silica tetrahedral layer joined to a single alumina octahedral layer. Although the kaolin group minerals are chemically the same, each is structurally unique as a result of how these layers are stacked on top of one another. Kaolinite is the most common kaolin group mineral and is an important industrial commodity used in ceramics, paper coating and filler, paint, plastics, fiberglass, catalysts, and other specialty applications. See CLAY MINERALS; SILICATE MINERALS. The term “kaolin” is derived from the Chinese word Kau-ling (high ridge), which is the name given to a hill in the Jiang Xi Province where white clay was mined for porcelain as early as the seventh century. Deposits of relatively pure kaolinite are called kaolin, which leads to some confusion because the terms are often used interchangeably; however, kaolinite is the name for the mineral whereas kaolin applies specifically to a rock made up of more than one mineral, or to the mineral group. To avoid confusion, the term “kandite” was introduced as the mineral group name, but it has not been universally accepted and consequently kaolin group is preferred. Structure. The kaolinite unit layer consists of one silica tetrahedral sheet and one alumina octahedral sheet stacked such that apical oxygens from the tetrahedral sheet replace the hydroxyls in the octahedral sheet (Fig. 1). This stacked unit of one tetrahedral and one octahedral sheet is known as a 1:1 layer. The tetrahedral sheet is made up of individual silica tetrahedra linked to neighboring tetrahedra through three shared oxygens occupying a basal plane called the siloxane surface. The fourth, or apical, oxygen points upward in a direction normal to the siloxane surface and forms part of the adjacent octahedral
OH
OH
OH
OH
OH
OH
OH
OH
Key:
OH
oxygen
OH
hydroxyl
OH OH
aluminum
silicon
Fig. 1. Diagram of a kaolinite unit layer. (After R. E. Grim, Applied Clay Mineralogy, McGraw-Hill, 1962)
1 µm Fig. 2. Scanning electron micrograph showing pseudohexagonal kaolinite particles.
sheet. The octahedral sheet is made up of octahedrally coordinated Al3+ ions linked together at the octahedral edges. In kaolinite there is little to no cation substitution and therefore essentially no layer charge. In the cases where substitution does occur, the most common are the substitution of Al3+ for Si4+ in the tetrahedral sheet and Fe2+ or Fe3+ for Al3+ in the octahedral sheet. Kaolinite exhibits varying amounts and types of stacking disorder related to translations between adjacent layers. Ordered kaolinite has very little stacking disorder compared to disordered kaolinite. Properties. Kaolinite crystals are typically six-sided or pseudohexagonal in outline and may occur as single thin platelets or large accordionlike vermiform crystals (Fig. 2). Individual crystals range from approximately 0.1 to over 10 micrometers in diameter and from 15 to 50 nanometers in thickness. Kaolinite has a relatively high charge density and is therefore naturally hydrophilic, making it easily dispersible in water. It may be chemically modified so that it becomes hydrophobic and organophilic. It does not naturally swell in water unless treated with intercalation compounds such as hydrazine, urea, and ammonium acetate. Kaolinite is intrinsically white, has low plasticity, and has a refractive index of about 1.57. It has a low cation exchange capacity, and is chemically inert over a wide pH range except in some organic systems where it exhibits catalytic activity. Genesis. Certain geochemical conditions favor the formation of kaolinite, which is most commonly found in surficial environments as a product of chemical weathering. Humid, temperate to tropical environments, where dilute, acid solutions are derived from the leaching of well-drained organic-rich soils, provide ideal conditions for kaolinite formation. The most common parent materials from which kaolinite is derived are feldspars and micas. These minerals, which contain the silica and alumina required for kaolinite formation, readily weather to kaolinite under the conditions listed above. Kaolinite also fills void spaces in sandstone, where it forms as a result of burial diagenesis. See DIAGENESIS. Large accumulations of relatively pure kaolinite are classified as primary or secondary. Primary
Kapitza resistance
Kapitza resistance A resistance to the flow of heat across the interface between liquid helium and a solid. A temperature difference is required to drive heat from a solid into liquid helium, or vice versa; the temperature discontinuity occurs right at the interface. The Kapitza resistance, discovered by P. L. Kapitza, is defined in the equation below, where TS and TH are the solid RK =
TS − T H ˙ Q/A
˙ and helium temperatures and Q/A is the heat flow per unit area across the interface. See CONDUCTION (HEAT). In principle, the measured Kapitza resistance should be easily understood. In liquid helium and solids (such as copper), heat is carried by phonons, which are thermal-equilibrium sound waves with frequencies in the gigahertz to terahertz region. The acoustic impedance of helium and solids can differ by up to 1000 times, which means that the phonons mostly reflect at the boundary, like an echo from a cliff face. This property together with the fact that the number of phonons dies away very rapidly at low temperatures means that at about 1 K there are few phonons to carry heat and even fewer get across the interface. The prediction is that the Kapitza resistance at the interface is comparable to the thermal resistance of a 10-m (30-ft) length of copper with the same cross section. See ACOUSTIC IMPEDANCE; PHONON; QUANTUM ACOUSTICS. The reality is that above 0.1 K and below 0.01 K (10 mK) more heat is driven by a temperature difference than is predicted (see illus.). Above 0.1 K
109 108 107 Kapitza resistance, m2 kW−1
deposits are those derived from the in-situ alteration of feldspar-rich igneous, metamorphic, or sedimentary rocks such as granite, gneiss, or arkose. Alteration results from hydrothermal action, surface weathering, ground-water interaction, or a combination of these effects. Secondary deposits, also called sedimentary, are those that have been eroded, transported, and deposited as beds or lenses and are associated with sedimentary rocks. Primary deposits are more common than secondary deposits, which require special geologic conditions for their formation and preservation. For example, the extensive and high-quality paper-coating-grade deposits mined in the southeastern United States were most likely formed as the result of four events: (1) intense chemical weathering of the crystalline source area, (2) reworking and transportation of detrital grains by streams, (3) deposition of kaolinite-rich detritus along a passively subsiding margin, and (4) repeated cycles of oxidative weathering leading to recrystallization and continued kaolinization. See WEATHERING PROCESSES. Uses. Kaolin is mined and beneficiated for a variety of industrial uses ranging from paper coating, to pharmaceuticals, to a source of aluminum for aluminum compounds. It is a versatile mineral that is also used in ceramics, plastics, paints, rubber, adhesives, caulks, catalysts, inks, gaskets, cosmetics, animal feeds, fertilizers, and fiberglass. This diverse range of applications is the result of a unique combination of factors, including particle size, particle shape, color, chemical composition, rheological properties, chemical inertness, thermal properties, and relatively low cost. Functionally kaolin plays many different roles in these products, including adding strength, imparting thixotropy, enhancing chemical resistance and electrical resistivity, and improving opacity. See ORE DRESSING; RHEOLOGY. Because kaolin deposits naturally contain ancillary minerals along with kaolinite, at least some processing is required. Two processing options are available: wet and dry. The quality of the raw material and the desired properties of the final product dictate which process is used. Wet processing results in a purer, more uniform product and may employ a number of processing steps, including particle size separation, selective flocculation, magnetic separation, delamination, flotation, reductive leaching, calcination, and surface treatment. See CERAMICS; PARTICULATES. Jessica Elzea Kogel Bibliography. D. D. Carr (ed.), Industrial Minerals and Rocks, 6th ed., Society of Mining, Metallurgy and Exploration, 1994; R. E. Grim, Applied Clay Mineralogy, McGraw-Hill, 1962; D. M. Moore and R. C. Reynolds, X-ray Diffraction and the Identification and Analysis of Clay Minerals, Oxford University Press, 1997; H. H. Murray, W. Bundy, and C. Harvey (eds.), Kaolin Genesis and Utilization, Clay Minerals Soc. Spec. Publ., no. 1, 1993; A. C. D. Newman (ed.), Chemistry of Clays and Clay Minerals, Mineralog. Soc. Monog., no. 6, 1987.
acoustic theory
106 105 104 103 102 10 1
experiment
10−1 10−2 10−3 10−3
1 10−2 10−1 absolute temperature, K
10
Plot of the Kapitza resistance between copper and liquid helium-3 (3He) as a function of absolute temperature. The acoustic theory is based upon heat transfer by thermalequilibrium sound waves. The line labeled experiment is representative of many experimental results.
555
556
Kapok tree this is now understood to be a result of imperfections such as defects and impurities at the interface, which scatter the phonons and allow greater transmission. See CRYSTAL DEFECTS. The enormous interest in ultralow-temperature (below 10 mK) research generated by the invention of the dilution refrigerator and the discovery of superfluidity in liquid helium-3 (3He) below 0.9 mK also regenerated interest in Kapitza resistance, because heat exchange between liquid helium and solids was important for both the dilution refrigerator and superfluidity research. An ingenious technique was invented to overcome the enormous Kapitza resistance at 1 mK: The solid is powdered, and the powder is packed and sintered to a spongelike structure to enhance the surface area. In this way a 1-cm3 (0.06-in.3) chamber can contain up to 1 m2 (10 ft2) of interface area between the solid and the liquid helium. It was found that at 1 mK the Kapitza resistance is 100 times smaller than predicted by the phonon model. There have been two explanations for the anomaly, and probably both are relevant. One is that energy is transferred by magnetic coupling between the magnetic 3He atoms and magnetic impurities in the solid or at the surface of the solid; the other is that the spongelike structure has quite different, and many more, phonons than a bulk solid and that these can transfer heat directly to the 3He atoms. Whatever its cause, this anomaly has had a major impact on ultralow-temperature physics. On the one hand, it has allowed liquid 3He to be cooled by adiabatic demagnetization refrigerators to well below 1 mK (the record is about 10 µK) so that its superfluid properties can be studied over a wide range of temperature. On the other, it has allowed the development of the dilution refrigerator so that it can operate to below 2 mK. In the dilution refrigerator the fluid that refrigerates is liquid 3He. The purpose, however, is to refrigerate a sample or an experiment, and that requires heat exchange to the cold liquid 3 He. Such refrigerators are used to investigate the properties of matter at ultralow temperature. See ADIABATIC DEMAGNETIZATION; LIQUID HELIUM; LOWTEMPERATURE PHYSICS; SUPERFLUIDITY. John P. Harrison Bibliography. D. F. Brewer (ed.), Progress in Low Temperature Physics, vol. 12, 1989; E. T. Swartz and R. O. Pohl, Thermal boundary resistance, Rev. Mod. Phys., 61:605–668, 1989.
Kapok tree Also called the silk-cotton tree (Ceiba pentandra), a member of the bombax family (Bombacaceae). The tree has a bizarre growth habit and produces pods containing seeds covered with silky hairs called silk cotton (see illus.). It occurs in the American tropics, and has been introduced into Java, Philippine Islands, and Sri Lanka. The silk cotton is the commercial kapok used for stuffing cushions, mattresses, and pillows. Kapok has a low specific gravity and is im-
Pods and leaves of kapok tree (Ceiba pentandra).
pervious to water, making it excellent for filling life preservers. See MALVALES. Perry D. Strausbaugh; Earl L. Core
Karm ´ an ´ vortex street A double row of line vortices in a fluid. Under certain conditions, a K´arm´an street is shed in the wake of bluff cylindrical bodies when the relative fluid velocity is perpendicular to the generators of the cylinder (Fig. 1). This shedding of eddies occurs first from one side of the body and then from the other, an unusual phenomenon because the oncoming flow may be perfectly steady. Vortex streets can often be seen, for example, in rivers downstream of the columns supporting a bridge. The streets have been studied most completely for circular cylinders at low subsonic flow speeds. Regular, perfectly periodic vortex streets occur in the range of Reynolds number (Re) 50–300, based on cylinder diameter. Above a Reynolds number of 300, a degree of randomness begins to occur in the shedding due to secondary instabilities, which becomes progressively greater as Reynolds number increases, until finally the wake is completely turbulent. The highest Reynolds number at which some slight periodicity is still present in the turbulent wake is about 106. See REYNOLDS NUMBER. Vortex streets and their effects are common. For example, they can be created by steady winds blowing past smokestacks, transmission lines, bridges, missiles about to be launched vertically, and pipelines aboveground in the desert. They are also related to the phenomenon that generates cloud a U
h a 2
Fig. 1. Diagram of a Karm ´ an ´ vortex street. The streamwise spacing of the vortices, h, and the spacing normal to it, a, are shown.
Karst topography
Fig. 2. Vortex patterns downstream of Guadelupe Island, imaged by the SeaWIFS satellite.
patterns indicating the alternate shedding of vortices downstream of an island, as captured by satellite images (Fig. 2). Although the island only forms a surface obstruction to the wind, geophysical effects (Coriolis effects) generate a stagnant column of air in the atmosphere above it that creates an obstruction analogous to a cylinder. See CORIOLIS ACCELERATION. The alternating vortex shedding that coincides with a well-formed vortex street produces oscillation in lateral forces acting on the body. If the vortex shedding frequency is near a natural vibration frequency of the body, the resonant response may cause large-amplitude vibration and in some cases structural damage. The Aeolian tones, or singing of wires in a wind, is an example of forced oscillation due to formation of a vortex street. Such forces can impose on structures unwanted vibrations, often leading to serious damage. An example of unwanted vibrations which led to serious damage is the collapse of the Tacoma Narrows (Washington) suspension bridge in 1940. This large structure made of thousands of tons of steel collapsed after a few hours of ever-growing oscillations induced by a steady wind (Fig. 3).
Fig. 3. Tacoma Narrows Bridge in the middle of its wild oscillations, a little before it collapsed.
T. von K´arm´an showed that an idealized, infinitely long vortex street is stable to small disturbances if the spacing of the vortices is such that h/a = 0.281; actual spacings are close to this value. A complete and satisfying explanation of the formation of vortex streets, however, has not yet been given. For 103 < Re < 105, the shedding frequency f for a circular cylinder in low subsonic speed flow is given closely by fd/U = .21, where d is the cylinder diameter and U is stream speed; a/d is approximately 5. This means that for a cylinder of a certain diameter immersed in a fluid, the frequency of vortex shedding is proportional to the velocity of the oncoming stream and inversely proportional to the diameter. A. Roshko discovered a spanwise periodicity of vortex shedding on a circular cylinder at Re = 80 of about 18 diameters; thus, it appears that the line vortices are not quite parallel to the cylinder axis. See FLUID-FLOW PRINCIPLES; VORTEX. Arthur E. Bryson, Jr.; Demetri P. Telionis Bibliography. W. R. Debler, Fluid Mechanics Fundamentals, 1990; J. A. Liggett, Fluid Mechanics, 1994; R. L. Panton, Incompressible Flow, Wiley Interscience, 2d ed., 1996; F. M. White, Fluid Mechanics, McGraw-Hill, 6th ed., 2006.
Karst topography Distinctive associations of third-order, erosional landforms indented into second-order structural forms such as plains and plateaus. They are produced by aqueous dissolution, either acting alone or in conjunction with (and as the trigger for) other erosion processes. Karst is largely restricted to the most soluble rocks, which are salt (360,000 mg/liter), gypsum and anhydrite (2400 mg/liter), and limestone and dolostone (30–400 mg/liter). The numbers denote effective solubility in meteoric waters under standard conditions. Being so soluble, salt is seen only in the driest places (for example, Death Valley, California), where it displays intensive dissolution topography. Surface gypsum karst is also comparatively rare in humid regions, but its dissolution at depth may produce widespread collapse and subsidence landforms (covered karst); it is often well developed in arid and semiarid areas such as the Pecos River valley of New Mexico. Pure limestones (CaCO3) are the principal karst rocks, hosting the greatest extent and variety of features. Dolostone [the double carbonate mineral, CaMg (CO3)2] is significantly less soluble: the type and scale of dissolutional forms developed in it is normally less than in limestone, giving rise to topographies that are transitional to fluvial (streamderived) landscapes. See DOLOMITE ROCK; GYPSUM; LIMESTONE. Karst rocks outcrop over ∼12% of the Earth’s continental and island surfaces, but the distinctive topography is limited to about 8%. However, many wellknown places are entirely karstic (the Bahamas and the majority of the Greek islands, for example), and 20–25% of the world’s population relies on ground water circulating to some extent in karst rocks, so that karst studies are of considerable practical
557
558
Karst topography Solution
Collapse
Suffusion
Subsidence
salt
Fig. 1. Principal genetic types of sinkholes. (After D. C. Ford and P. W. Williams, Karst Geomorphology and Hydrology, 1989; copyright Chapman and Hall; used with kind permission from Kluwer Academic Publishers)
importance. See GROUND-WATER HYDROLOGY. The essence of the karst dynamic system is that meteoric water (rain or snow) is routed underground, because the rocks are soluble, rather than flowing off in surface river channels. It follows that dissolutional caves develop in fracture systems, resurging as springs at the margins of the soluble rocks or in the lowest places. A consequence is that most karst topography is “swallowing topography,” assemblages of landforms created to deliver meteoric water down to the caves. The word “karst” derives from a Slavic regional term, kras, meaning stony ground—stony because the former soil cover in western Slovenia and neighboring areas was lost into caves as a consequence of deforestation and overgrazing. Karst landforms develop at small, intermediate, and large scales. Karren. This is the general name given to smallscale forms—varieties of dissolutional pits, grooves, and runnels. Individuals are rarely greater than 10 m (30 ft) in length or depth, but assemblages of them can cover hundreds of square kilometers. On bare rock, karren display sharp edges; circular pits or runnels extending downslope predominate. Beneath soil, edges are rounded and forms more varied and intricate. The largest karren are clefts following major joint sets. Where regular, shallow, and exposed, they create the appearance of a laid paving-limestone pavement, which is common in glaciated regions. Where deeper, they may become ruiniform (corridor karst)
or, if the blocks between the clefts are tapered by runnelling dissolution, pinnacled. Dense clusters of pinnacles up to 20 m (65 ft) or more are the most spectacular karren, termed shilin (stone forests) in Yunnan, China, where the largest assemblages occur. Particularly dense and complex pitting patterns occur along seashores. The finest development is on tropical coasts where formation of deep pits and sharp residual pinnacles is enhanced by biological action—phytokarst. Sharp dissolution notching may occur at the high-tide line. Sinkholes. Also known as dolines or closed depressions, sinkholes are the diagnostic karst (and pseudokarst) landform. They range from shallow, bowllike forms, through steep-sided funnels, to vertical-walled cylinders (Fig. 1). Asymmetry is common. Individual sinkholes range from about 1 to 1000 m (3 to 3300 ft) in diameter and are up to 300 m (1000 ft) deep. Many may become partly or largely merged. Sinkholes can develop entirely by dissolution acting downward into receptor caves, by collapse of cave roofs upward, by washing of unconsolidated fine sediments such as soil cover into underlying cavities, or by slow subsidence. Most karst sinkholes are produced by a combination of the first two or three processes. On plains, sinkholes may be scattered like shell pits, or packed together with overlapping perimeters (polygonal karst, found in the Sinkhole Plain of Kentucky). Densities greater than 1000/km2 are known. In mountainous country, deep polygonal karst can create a spectacular “egg carton” topography of steep-sided sinkholes (cockpits) between conical hills, termed fengcong (peak cluster) in southern China where the finest examples are found. Dry valleys and gorges. These large-scale features are carved by normal rivers, but progressively lose their water underground (via sinkholes) as the floors become entrenched into karst strata. Many gradations exist, from valleys that dry up only during dry seasons (initial stage) to those that are without any surface channel flow even in the greatest flood periods (paleo-valleys). They are found in most plateau and mountain karst terrains and are greatest where river water can collect on insoluble rocks before penetrating the karst (allogenic rivers). Poljes. A Serbo-Croatian term for a field, polje is the generic name adopted for the largest individual karst landform. Ideally, this is a topographically closed depression with a floor of alluvium masking an underlying limestone floor beveled flat by planar corrosion. Rivers may cross the alluvium, flowing from upstream springs to downstream sinks. The polje is flooded seasonally, when dissolutional undercutting takes place around the perimeter of enclosing limestone hills, and further alluvium is added to the insoluble shield protecting the floor. In this manner, the polje expands. Ideal poljes up to a few kilometers in length are found in many karst regions. Chains of 10 km (6 mi) poljes and individuals up to 60 km (36 mi) in length occur in former Yugoslavia, where they are partly of tectonic origin—karstic adaptations
Kelvin bridge
559
abandoned caves often marking former floodplain levels
river
floodplain
foot cave
corrosion plain surface
(a) towers continue to reduce by corrosion
alluvium incised
stripped terrace
terrace
incised river
terrace
developing corrosion plain (b) stripping of corrosion plain commences
remnant of stripped corrosion plain
cockpit begins to form
stripped terrace
deeply incised river
shallow water table
unkarstified rock
abandoned foot cave
rejuvenation head in subterranean streams working upstream
deep water table
(c) Fig. 2. Development of a multistage, polygenetic karst landscape of towers, cones, relict caves, corrosion plains, and terraces. (After D. C. Ford and P. W. Williams, Karst Geomorphology and Hydrology, 1989; copyright Chapman and Hall; used with kind permission from Kluwer Academic Publishers)
of downfaulted blocks. The Yugoslav examples are preeminent and are known as classical karst; they contain most of the cultivatable land in many areas, the intervening hills being unusable because of the great densities of karren and sinkholes. See FAULT AND FAULT STRUCTURES. Karst plains and towers. These are the end stage of karst topographic development in some regions, produced by long-sustained dissolution or by tectonic lowering. The plains are of alluvium, with residual hills (unconsumed intersinkhole limestone) protruding through. Where strata are massively bedded and the hills are vigorously undercut by seasonal floods or allogenic rivers, they may be steepened into vertical towers. Tower karst (fenglin in Chinese) is the most accentuated karst topography, with individuals rising to as much as 500 m (1650 ft). The greatest tower karst development is in Guizhou and Guangxi provinces, China. Tower landscapes partly drowned by postglacial rising seas in Haiphong Bay, Vietnam, and Phuket, Thailand, are also famous tourist sights. Where strata are thinner and undercutting is ineffective, the hills tend to be conical. Plains with cone karst are common in Puerto Rico and Cuba. Tectonic uplift may induce incision by trunk rivers, producing complex multistage karst land-
scapes of towers, cones, and corrosion plains (Fig. 2). See CAVE; GEOMORPHOLOGY; WEATHERING PROCESSES. Derek Ford Bibliography. W. Dreybrodt, Processes in Karst Systems, Springer-Verlag, Berlin, 1988; D. C. Ford and P. W. Williams, Karst Geomorphology and Hydrology, Chapman and Hall, London, 1989; J. N. Jennings, Karst Geomorphology, Blackwell, Oxford, 1985; P. T. Milanovic, Karst Hydrogeology, Water Resources Publications, Littleton, CO, 1981; W. B. White, Geomorphology and Hydrology of Karst Terrains, Oxford University Press, 1988; D. Yuan, Karst of China, Geological Publishing House, Beijing, 1991.
Kelvin bridge A specialized version of the Wheatstone bridge network designed to eliminate, or greatly reduce, the effect of lead and contact resistance and thus permit accurate measurement of low resistance. The circuit shown in illus. a accomplishes this by effectively placing relatively high-resistance-ratio arms in series with the potential leads and contacts of the low-resistance standards and the unknown
former level of corrosion surface
560
Kelvin’s circulation theorem
E
RA
Rs
Ra
G
RB
Rs
RA E G
Ry
RG
Rb
Rx
RG
R ′s
R ′x
RB
(a)
R ′G
Rx
(b)
Kelvin bridge. (a) Actual circuit. (b) Equivalent Wheatstone bridge circuit.
resistance. In this circuit RA and RB are the main ratio resistors, Ra and Rb the auxiliary ratio, Rx the unknown, Rs the standard, and Ry a heavy copper yoke of low resistance connected between the unknown and standard resistors. If the unwanted potential drop along the conductor linking Rs and Rx is divided (by Ra/Rb) in the same ratio (Rs/Rx) as that of the potential drops across Rs and Rx, the zero reading of the galvanometer (G) is unaffected by its actual value Ry. Therefore, if Ra/Rb is adjustable (for example, by adjusting Ra) so that a null indication of G is obtained no matter what the value of Ry, the basic balance equation for the Wheatstone bridge (1) holds. Rx =
RB Rs RA
(1)
A practical way of attaining this condition is to (1) adjust RA/RB with the lowest possible value of Ry (for example, a low-resistance copper link), and then (2) adjust Ra/Rb with Ry infinite (for example, with the link open-circuited). Steps 1 and 2 are carried out alternately as necessary until G is nulled for both conditions without significant further adjustments. The final adjustment of RA/RB obtains the bridge balance condition, Eq. (1). A formal network analysis can be carried out by applying a delta-wye transformation to the network consisting of Ra, Rb, and Ry. The equivalent Wheatstone bridge network shown in illus. b is obtained, where Eqs. (2) hold. By an analysis similar to that for Rs =
Ry Ra Ry + Ra + Rb
Rx =
Ry Rb Ry + Ra + Rb
RG =
Ra Rb Ry + Ra + Rb
(2)
the Wheatstone bridge, it can be shown that for a balanced bridge Eq. (3) holds. If Eq. (4) is valid, the Rx =
RB Rs RA + Ry
Rb Ra + Rb + Ry Rb RB = RA Ra
Rb RB − RA Ra
(3)
bridge for routine engineering measurements is constructed using both adjustable ratio arms and adjustable standards. However, the ratio is usually continuously adjustable over a short span, and the standard is adjustable in appropriate steps to cover the required range. See WHEATSTONE BRIDGE. Sensitivity. The Kelvin bridge sensitivity can be calculated similarly to the Wheatstone bridge. The open-circuit, unbalance voltage appearing at the detector terminals may be expressed, to a close degree of approximation, as in Eq. (5). e=E
r (r + 1)2
Rx r Rx + Ry r+1
(5)
The unbalance detector current for a closed detector circuit may be expressed as in Eq. (6). Rx E Rx (6) IG = RG + R + R + R + R A B a b r/(r + 1)2 The Kelvin bridge requires a power supply capable of delivering relatively large currents during the time a measurement is being made. The total voltage applied to the bridge is usually limited by the power dissipation capabilities of the standard and unknown resistors. Errors. Kelvin bridge resistance-measurement errors are caused by the same factors as for the Wheatstone bridge. However, additional sources of error, as implied by the second term of Eq. (2), must be evaluated since these factors will seldom be reduced to zero. For minimum error the yoke resistance should be made as low as possible by physically placing the commonly connected current terminals of the unknown and standard as close together as possible and connecting with a low-resistance lead. The ratio resistors each include not only the resistance of the resistors but also that of the interconnecting wiring and external leads and the contact resistance of the potential circuit contacts. The external leads are most likely to cause errors, and they should therefore be of the same resistance so that Eq. (3) will be fulfilled as nearly as possible. In addition, they should be relatively short, since the addition of a large resistance (long leads) will introduce an error in the calibrated ratio RB /RA. For precise measurements, trimmer adjustments are required in the ratio-arm circuits and provision is made to connect the bridge resistors into two different Wheatstone bridge configurations. By successively balancing first the Kelvin network and then each of the Wheatstone networks, these additive errors are virtually eliminated. See BRIDGE CIRCUIT; RESISTANCE MEASUREMENT. Charles E. Applegate; Bryan P. Kibble
(4)
second term of Eq. (3) is zero, the measurement is independent of Ry, and Eq. (1) is obtained. As with the Wheatstone bridge, the Kelvin
Kelvin’s circulation theorem A theorem useful for ideal flows that have negligible viscous forces and a nearly constant density (incompressible flows). Such flows are potential flows and
Kelvin’s circulation theorem are applicable to the aerodynamics of wings. One characteristic of potential flows is a lack of vorticity. Vorticity characterizes the solid-body-like rotation of the fluid particles as they translate and deform. If a small element of fluid in a flow were instantaneously frozen, vorticity would indicate its spinning motion. Potential flows have zero vorticity. See POTENTIAL FLOW; VORTICITY; WING. Circulation is a number associated with the spinning motion of a large region of a flow. Consider a plane flow such as the flow around a wing that spans a very big wind tunnel (Fig. 1). Imagine that you choose a certain closed path to walk around in a counterclockwise direction. As you walk at a uniform pace, measure and record the wind velocity component that is in the same direction as your path. Ignore the side component. Wind velocity in the same direction as your path is positive, while wind velocity against your path is negative. When you return to the starting point, the average velocity number indicates the circulation for that circuit. Actually, the circulation is the average wind component times the length of the path. Mathematically it can be shown that the circulation is the area integral of the vorticity over the region bounded by the closed-path curve, hence the interpretation as the global spinning character of a large region. In potential flows, the average velocity along the path is always zero for any path within the potential flow region. The average flow against the path is the same as the average flow with the path. Kelvin’s circulation theorem states that the circulation for a circuit of material particles is constant with time in an ideal flow. That is, the average rate of spinning of a piece of fluid is constant as it moves with the flow. This is a direct result of the lack of friction. As an example, consider a plane vortex flow. This is a swirling potential flow of circular streamlines with zero vorticity everywhere except for the central core, where the vorticity is very high because of viscous forces. Any closed circuit that excludes the core has a circulation of zero. Any closed circuit that includes the core has the same finite circulation. The circulation is a measure of the strength of the vortex. Consider a wing, helicopter blade, or fan blade starting from rest. The original and current positions are shown in Fig. 2. If a fluid is at rest, the circulation is zero for any circuit. A region of potential flow starting from rest will continue to have a circulation of zero for all time. As the wing starts moving, viscous forces are important in a boundary layer near the surface and produce a thin region of vorticity. The boundary layer fluid is not an ideal potential flow. This region is swept off and forms the core of a starting vortex that remains near the starting position. A thin viscous wake connects the starting vortex to the viscous boundary layer region that continues with the wing. A circuit around the wing—outside the boundary layer, the wake, and the starting vortex core—has zero circulation. However, a circuit cutting the viscous wake and including the vortex has a circulation of one sense; and a circuit cutting the wake and including the wing has a circulation
561
fluid velocity along path
fluid velocity
streamline
flow velocity
path Fig. 1. Cross-section view of a wing in a wind tunnel depicting a circulation path.
Γ=0
boundary layer
Γ64 F/18 C (tropical)
k
Mean annual temperature ◦ ◦ 64 F/18 C (tropical)
h
k
C
D
Midlatitude with mild winters
Midlatitude with severe winters
s
Mild with dry summer (mediterranean)
f
Mild with no dry season (humid subtropical)
w f
w
Mild with dry winter Humid with severe winters; no dry season (humid-continental)
Severe dry winters (subarctic)
a
Mean annual temperature ◦ ◦ → <article> <noun> 3. → <article> <noun> 4. → SEES 5. <article> → THE 6. <noun> → BOY 7. <noun> → GIRL From this set of production rules, we can distinguish the elements that make up the grammar. The set of nonterminal symbols, N, is formed by the entities enclosed in pointed brackets. That is, N = {<sentence>, <subject>, , , <article>, <noun>}. The alphabet has four elements, namely, = {SEES, THE, BOY, GIRL}. Each of the elements of the alphabet should be considered an indivisible unit. The replacement rules indicate how sequences of words in the language can be put together to generate sentences. For the given set of replacement rules, the interpretation of the first production is that a sentence is formed by a <subject> followed by a followed by an . Similar interpretations can be given to rules 1 through 3. Rules 4 and 5 indicate that SEES, THE, BOY, and GIRL are instances of a , <article>, or <noun>, respectively. When there is more than one choice for a particular entity, such as <noun>, the entity can be replaced by the right-hand sides of one of the production rules. In this case, <noun> can be replaced by either BOY or GIRL. The starting symbol of this grammar is, by convention, the element of N that appears at the left of the symbol → of the first production. In this case, <sentence> is the starting nonterminal. One sentence generated by this grammar is THE GIRL SEES THE BOY. The sequence of replacement rules used to generate or derive this sentence is as follows: <sentence> → <subject> . <sentence> → <article> <noun> . (using production 2 to replace <subject>) <sentence> → THE <noun> . (using production 5 to replace <article>) <sentence> → THE GIRL . (using production 7 to replace <noun>) <sentence> → THE GIRL SEES . (using production 4 to replace ) <sentence> → THE GIRL SEES <article> <noun> (using production 2 to replace <subject>) <sentence> → THE GIRL SEES THE <noun> (using production 5 to replace <article>) <sentence> → THE GIRL SEES THE BOY. (using production 6 to replace <noun>) Notice that all the elements of the sentence THE
653
654
Language theory <sentence>
<subject>
<article>
<noun>
THE
GIRL
SEES
<article>
<noun>
THE
BOY
Fig. 1. Derivation tree of the sentence THE GIRL SEES THE BOY.
GIRL SEES THE BOY belong to the set of terminals. At each step of the derivation, a nonterminal on the right-hand side of a production was replaced by another sequence of nonterminal symbols or by a single terminal. The three additional sentences that can be generated using the production rules of this grammar are: THE BOY SEES THE GIRL, THE BOY SEES THE BOY, THE GIRL SEES THE GIRL. These four sentences form the language generated by the grammar. As this example shows, it is not necessary to explicitly indicate the elements of N, , P, and S of the definition of a grammar since all these elements can be derived directly from the set of production rules. Figure 1 shows a derivation tree of the sentence generated before. Types of grammars. Grammars can be classified according to the nature of their productions. If G = (N, , P, S) and for every production α → β in P, |β| ≥ |α|, the grammar is said to be context-sensitive or type 1. Some authors require that the productions of this type of grammar be of the form α 1Aα 2 → α 1βα 2, where α 1, α 2, and β are strings, β = , and A ∈ N. This restriction, which does not alter the language generated by the grammar, motivates the term context-sensitive because A can be replaced by β
only within the context of α 1 and α 2 according to α 1Aα 2 → α 1βα 2. If the productions α → β of the grammar G are such that |α| = 1 and α is a single element of N and β is a string of terminals and/or nonterminals, then the grammar is a context-free grammar or type 2. Context-free grammars and the language that they generate are very important in computer science since most of the computer languages are generated by this type of grammar. Acceptors, generators, and translators. The relationship between abstract languages and machines can be established through three types of automata: acceptors, generators, and translators. An acceptor is a deterministic finite-state machine which, starting in a predefined initial state, accepts or rejects sequences of input symbols from an alphabet. We say that a finite-state machine operates deterministically if its responses depend solely on its initial state and the input sequence presented. In simple terms, we can picture an acceptor as a machine connected to a lamp which can be turned on and off. As the machine processes each of the symbols of the input sequence, the machine turns the lamp on or off. If the machine turns the lamp on after processing the last symbol of the input sequence, the machine accepts the input sequence; otherwise, the machine rejects the sequence. A generator is a nondeterministic finite-state machine. When the generator is started from its initial state, it produces a sequence of symbols from a given alphabet known as the output alphabet. It is assumed that the machine will act according to chance producing different sequences each time it is operated. The language of the machine is the set of all sequences that it will ever produce. A translator is a machine that after processing an input sentence (made up of symbols of an input alphabet) translates the sentence into another sentence (made of symbols of an output alphabet). Translators are important not only in the translation of natural languages (for example, from English
Nonenumerable Sets Recursive Functions Recursively Enumerable
Turing Machines Post Systems generalize
Context-Free Languages
Pushdown Automata
generalize
Regular Languages
Finite-State Machines
Languages
Machines
Regular Expressions
Fig. 2. Hierarchy of abstract machines and languages. (After P. Denning et al., Machines, Languages, and Computation, 1978)
Lanthanide contraction to Spanish) but also in the translation of computer languages (for example, from C++ to assembly or machine language). See NATURAL LANGUAGE PROCESSING; PROGRAMMING LANGUAGES. Figure 2 shows a hierarchy of automata and the corresponding hierarchy of languages that they define. Although some of the topics mentioned in this hierarchy are out of the scope of this article, they are included here for completeness. Ramon A. Mata-Toledo Bibliography. A. V. Aho et al., The Theory of Parsing, Translation, and Compiling, vol. 1, 1972 J. E. Hopcroft et al., Introduction to Automata Theory, Languages, and Computation, 2d ed., 2000; J. Martin, Introduction to Languages and the Theory of Computation, 2002.
cholesterol and 35–40% lanosterol. The latter is a key intermediate in the biosynthesis of steroids from squalene. See CHOLESTEROL; SQUALENE; STEROID. The supply of lanolin depends directly on the level of wool production; the consumption of lanolin and derived products in the United States is about 1 × 107 lb (4.5 × 106 kg) per year. Crude lanolin from the wool-scouring process can be refined to a colorless, odorless material (USP grades). A particularly useful property of lanolin is the formation of stable water-in-oil emulsions containing up to 25% water (hydrous lanolin). Major uses are as an emollient and skin moisturizer in lotions and cosmetic products, and in medicinal ointments. Unrefined lanolin has some use in inks and as a corrosion and rust preventative. See FAT AND OIL; WOOL. James A. Moore
Lanolin
Lanthanide contraction
A soft, waxy material derived from the greasy coating on raw wool. In the processing of wool, the fleece is scoured with an aqueous alkaline solution to remove debris, wax, water-soluble material, and free acids. The insoluble fraction, which constitutes about 15– 20% of the weight of the original wool, is crude lanolin. Lanolin is a very complex mixture of esters, similar to the skin lipids of birds and other animals. Analysis is carried out after saponification, which gives equal weights of acids and alcohols. The acids consist of 50–60% saturated fatty acids with chains up to C38– C40, comprising mostly isostructures or anteisostructures with a methyl group that is one or two carbons removed from the end of the chain [CH3CH(CH3)— or CH3CH2CH(CH3)—]. Another group (30–35%) comprises α-hydroxy acids and a small amount of ω-hydroxy acids. See ESTER. The alcohols from lanolin are of two types. About 15% are monohydroxy compounds with the same chain structures as the acids, and a minor amount of 1,2-diols. The major alcohol components are 35%
The name given to an unusual phenomenon encountered in the rare-earth series of elements. The radii of the atoms of the members of this series decrease slightly as the atomic number increases. Starting with element 58 in the periodic table, the balancing electron fills in an inner incomplete 4f shell as the charge on the nucleus increases. According to the theory of atomic structure, this shell can hold 14 electrons; so starting with element 58, cerium, there are 14 true rare earths. Lanthanum has no electrons in the 4f shell, cerium has 1, and lutetium, 14. The 4f electrons play almost no role in chemical valence; therefore, all rare earths can have three electrons in their valence shell and they all exist as trivalent ions in solution. As the charge on the nucleus increases across the rare-earth series, all electrons are pulled in closer to the nucleus so that the radii of the rare-earth ions decrease slightly as the compounds go across the rareearth series. Any given compound of the rare earths is very likely to crystallize with the same structure as any other rare earth. However, the lattice parameters become smaller and the crystal denser as the
Atomic and ionic radii of rare-earth metals Element Sc Y La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu ∗
Radius of M3⫹∗, nm
0.1061 0.1034 0.1013 0.0995 0.0979 0.0964 0.0950 0.0938 0.0923 0.0908 0.0894 0.0881 0.0869 0.0858 0.0848
Metal crystal structure† hcp hcp hcp fcc hcp hcp
0.16545‡ 0.18237 0.18852 0.18248 0.18363 0.18290
0.16280§ 0.17780 0.18694
rhom-hcp bcc hcp hcp hcp hcp hcp hcp fcc hcp
0.18105 0.1994 0.18180 0.18005 0.17952 0.17887 0.17794 0.17688 0.19397 0.17516
0.17943
Trivalent rare-earth ion. hexagonal close-packed; fcc, face-centered cubic; rhom, rhombic; bcc, body-centered cubic. values in this column calculated from atoms in basal plane. § All values in this column are for radii between layers. † hcp, ‡ All
Metallic radii, nm
0.18201 0.18139
0.17865 0.17626 0.17515 0.17428 0.17340 0.17237 0.17171
655
656
Lanthanum compound proceed across the series. This contraction of the lattice parameters is known as the lanthanide contraction. For many compounds the lattice parameters decrease only partway across the series, and when the contraction has progressed to that point, a new crystalline form develops. Frequently, both crystalline forms can be observed for a number of the elements. For this reason, the rare-earth series is of particular interest to scientists because many of the parameters determining the properties of a substance can be kept constant while the lattice spacings can be varied in small increments across the series. The atomic and ionic radii of atoms are not clearly defined. The atoms can be polarized by the neighboring atoms and there is no clear-cut boundary between the electrons associated with one atom and another. Therefore, the atomic radii will vary somewhat from compound to compound, and the absolute values depend on the method of calculation. However, if most of the parameters are assumed constant, and the difference in lattice parameters in the rare-earth crystalline series is attributed to the rare-earth ion or atom, the lanthanide contraction becomes evident. Although scandium and yttrium are not members of this series, the information is usually wanted at the same time and is given for completeness. The atomic radii of the trivalent ion and the metal atoms are given in the table. See PERIODIC TABLE; RARE-EARTH ELEMENTS. Frank H. Spedding
Lanthanum A chemical element, La, atomic number 57, atomic weight 138.91. Lanthanum, the second most abundant element in the rare-earth group, is a metal. The naturally occurring element is made up of the isotopes 138La, 0.089%, and 139La, 99.91%. 138La is a radioactive positron emitter with a half-life of 1.1 × 1011 years. The element was discovered in 1839 by C. G. Mosander and occurs associated with other rare earths in monazite, bastnasite, and other minerals. It is one of the radioactive products of the fission of uranium, thorium, or plutonium. Lanthanum is the most basic of the rare earths and can be separated rapidly from other members of the rare-earth series by fractional crystallization. Considerable quantities 1 1 H 3 Li 11 Na 19 K 37 Rb 55 Cs 87 Fr
2 4 Be 12 Mg 20 Ca 38 Sr 56 Ba 88 Ra
3 21 Sc 39 Y 71 Lu 103 Lr
4 22 Ti 40 Zr 72 Hf 104 Rf
lanthanide series actinide series
5 23 V 41 Nb 73 Ta 105 Db
6 24 Cr 42 Mo 74 W 106 Sg
7 25 Mn 43 Tc 75 Re 107 Bh
8 26 Fe 44 Ru 76 Os 108 Hs
9 27 Co 45 Rh 77 Ir 109 Mt
10 28 Ni 46 Pd 78 Pt 110 Ds
11 29 Cu 47 Ag 79 Au 111 Rg
12 30 Zn 48 Cd 80 Hg 112
13 5 B 13 Al 31 Ga 49 In 81 Tl 113
14 6 C 14 Si 32 Ge 50 Sn 82 Pb
15 16 7 8 N O 15 16 P S 33 34 As Se 51 52 Sb Te 83 84 Bi Po
57 58 59 60 61 62 63 64 65 La Ce Pr Nd Pm Sm Eu Gd Tb
66 67 Dy Ho
89 Ac
98 Cf
90 Th
91 Pa
92 93 94 95 96 97 U Np Pu Am Cm Bk
18 2 17 He 9 10 F Ne 17 18 Cl Ar 35 36 Br Kr 53 54 I Xe 85 86 At Rn
68 69 70 Er Tm Yb
99 100 101 102 Es Fm Md No
of it are separated commercially, since it is an important ingredient in glass manufacture. Lanthanum imparts a high refractive index to the glass and is used in the manufacture of expensive lenses. The metal is readily attacked in air and is rapidly converted to a white powder. Lanthanum becomes a superconductor below about 6 K (−449◦F) in both the hexagonal and face-centered crystal forms. See RARE-EARTH ELEMENTS. Frank J. Spedding Bibliography. F. A. Cotton et al., Advanced Inorganic Chemistry, 6th ed., Wiley-Interscience, 1999; K. A. Gschneidner Jr., J.-C. B¨ unzli, and V. K. Pecharsky (eds.), Handbook on the Physics and Chemistry of Rare Earths, 2005.
Laplace transform An integral extensively used by P. S. Laplace in the theory of probability. In simplest form it is expressed as Eq. (1). It is thought of as transforming the de ∞ f (s) = e−st φ(t) dt (1) 0
termining function φ(t) into the generating function f(s). The variable t is real, the variable s may be real or complex, s = σ + iτ . As an example, if φ(t) = 1 the integral converges for σ > 0, and f(s) = 1/s. The Laplace transform is used for the solution of differential and difference equations, for the evaluation of definite integrals, and in many branches of abstract mathematics (functional analysis, operational calculus, and analytic number theory). Method. Extensive tables of Laplace transforms exist, and these are used as any table of integrals. To show how a differential equation may be solved, two excerpts (A and B) from such a table can be used. A. f (s) = 1/(s − a)
φ(t) = eat
B. f (s) = 1/(s2 + 1)
φ(t) = sin t
Suppose it is required to find a solution y(t) of Eq. (2) such that y(0) = 1, y (0) = 2. Denote the y
(t) + y(t) = 2et y
=
d2 y
dy ,y = dt 2 dt
(2)
Laplace transform of the unknown function y(t) by Y(s). Integration by parts gives Eq. (3) on the assump ∞
e−st y
(t) dt = −y (0) − y(0)s 0
+ s2
∞
e−st y(t) dt = −2 − s + s2 Y (s)
(3)
0
tion that the integrated part is zero at t = ∞. Applying the Laplace transform to Eq. (2) and using A for the right-hand side, one obtains Eq. (4). The differential −2 − s + s2 Y (s) + Y (s) =
2 s−1
(4)
equation has become an algebraic one, whose
Laplace transform solution is Eq. (5). However, a further use of the table Y (s) =
1 1 + s − 1 s2 + 1
(5)
shows that the Laplace transform of y(t) = et + sin t is precisely the right-hand side of Eq. (5). Assuming uniqueness, one has thus obtained the required solution. Because its properties can be checked directly, the unproved assumptions need not be verified. This example illustrates the general method. The unknown function is taken as the determining function and the Laplace transform is applied to the differential (or difference) equation. There results an equation with the generating function as unknown, and this must be solved. Finally the determining function must be determined from the generating function, either from tables or by use of an inversion formula. In general, if the original differential equation is partial in any number of independent variables, one application of the Laplace transform reduces the number of these variables by one. If the equation was ordinary (one independent variable), the transformed equation is algebraic, as in the above example. Properties. Here are the fundamental properties of the Laplace transform: I. There exists a number σ c (perhaps +∞ or −∞) called the abscissa of convergence such that the integral in Eq. (1) converges for σ > σ c, diverges for σ < σ c. That is, the region of convergence is a half-plane (a half-line if s is real). II. The generating function is holomorphic for σ > σ c. III. The determining function is uniquely determined by the generating function. (Ambiguity is possible only on sets of measure zero.) IV. The product of two generating functions is in general a generating function. Thus, if Eq. (1) holds for two pairs of functions f1(s), φ 1(t) and f2(s), φ 2(t), then the product f1(s)f2(s) is the transform of the convolution t φ1 (t)∗ φ2 (t) = φ1 (u)φ2 (t − u) du 0
As was evident in the above example, it is very important to be able to derive the determining function φ(t) from the generating function f(s). This is especially true when tables are unavailable or inadequate. Any expression of φ(t) in terms of f(s) is called an inversion formula. Many are known. The classical one is Eq. (6). Here the integration is along any line c+i∞ 1 φ(t) = f (s)est ds 0 =), with the following properties: ( P1)
x x for all x S
( P2)
If x y and y x, then x = y
( P3)
If x y and y z, then x z
If < = is any partial ordering of S, then its converse or dual > =, defined by statement (1), is also a partial x y if and only if y x
(1)
ordering of S. This easily verified fact provides a fundamental duality principle, which is useful in many connections. Suppose, for example, the join x ∪ y of two elements x and y of a partially ordered set S is defined by conditions (2a) and (2b). (It is easily shown that x x∪y
yx∪y
If x z and y z , then x ∪ y z
(2a) (2b)
there is at most one such x ∪ y.) Then the duality principle suggests defining the meet x ∩ y by Eqs. (3a) and (3b), and shows that there is at most one such x ∩ y. >x∩y x=
>x∩y y=
> z and y = > z , then x ∩ y = >z If x =
(3a) (3b)
A lattice is defined as a partially ordered set (poset, for short) in which any two elements x and y have a
Lattice (mathematics) meet x ∩ y and a join x ∪ y. These binary operations satisfy the four basic identities: (L1)
x∩x =x∪x =x
(L2)
x ∩ y = y ∩ x and x ∪ y = y ∪ x
(L3)
x ∩ ( y ∩ z) = (x ∩ y) ∩ z and x ∪ ( y ∪ z) = (x ∪ y) ∪ z
(L4)
x ∩ (x ∪ y) = x ∪ (x ∩ y) = x
The operations ∩ and ∪ are connected with the relation < = by the condition that x < = y, x ∩ y = x, and x ∪ y = y are three equivalent statements. Conversely, if L is an algebraic system with operations ∩ and ∪ satisfying (L1) to (L4) for all x, y, z, then the preceding condition defines < = as a partial ordering of L, with respect to which ∩ and ∪ have the meanings defined above. This principle was discovered in 1880 by C. S. Peirce. Kinds of lattices. There are many different kinds of lattices. Thus the real numbers form a lattice if x< = y is given its usual meaning. This lattice is simply ordered, in the sense that (P4)
Given x and y, either x y or y x
Any such simply ordered set (or chain) is a lattice, in which x ∪ y is simply the larger of x and y, and dually. Again, the set J of positive integers forms a lattice, if one lets m < = n mean “m divides n” (usually denoted m | n). In this case, m ∩ n = gcd (m,n) and m ∪ n = lcm (m,n). Still again, one can let consist of all subsets S, T, . . . of a fixed ensemble I, and let S< = T mean that every point in S is in T. Then is a lattice, in which S ∩ T is the intersection of S and T, whereas S ∪ T is their union. Actually, is a boolean algebra. In all the preceding lattices, the distributive laws hold: (L5)
x ∩ (y ∪ z) = (x ∩ y) ∪ (x ∩ z) and x ∪ (y ∩ z) = (x ∪ y) ∩ (x ∪ z), for all x, y, z
Such lattices are called distributive lattices. Any chain is a distributive lattice; so is any boolean algebra. More generally, a ring of sets is defined as a family of subsets of a fixed set I which contains, with any S and T, also their intersections S ∩ T and their union S ∪ T. Then any ring of sets is a distributive lattice. It is obvious that each of the two identities of (L5) is dual to the other. It is a curious fact that, in a lattice, each also implies the other. If G is any group, then its subgroups form a lattice, and its normal subgroups also form a lattice, in both cases under inclusion. The normal subgroups satisfy the (self-dual) modular law: (L6)
Fig. 1. Ordinal number 4.
Fig. 2. The projective line over the field Z2 of integers mod 2.
the (n − 1)-dimensional projective geometry Pn−1(F) over F. This lattice contains special elements 0 (the zero vector) and I = Vn(F) (the whole space), such that (P5)
0 x I for all x
Such special elements always exist in any lattice whose chains all have finite length, but they need not exist in general; for example, they do not in the simply ordered set of real numbers. The lattice Pn−1(F) is complemented, in the sense that each subspace x has at least one complement x, with the property (L7)
x ∩ x = 0 and x ∪ x = I
Thus Pn − 1(F) is a complemented modular lattice. Similarly, it may be verified that the class of boolean algebras is precisely the class of complemented distributive lattices. This principle enables one to consider boolean algebra as a branch of lattice theory. Lattices L containing few elements can be conveniently visualized by diagrams. In these diagrams, small circles represent elements of L, a being higher than b whenever a > b. A segment is then drawn from a to b whenever a > b, but no x exists such that a > x > b. Any such diagram defines L up to isomophism: a > b if and only if one can travel from a to b along a descending broken line. Figures 1–6 are typical of such diagrams.
If x z, then x ∪ (y ∩ z) = (x ∪ y) ∩ z
In general, lattices satisfying (L6) are called modular and every distributive lattice is modular. The lattice of all linear subspaces of the ndimensional vector space Vn(F) over any field (or division ring) F is also a modular lattice, usually called
Fig. 3. The simplest nonmodular lattice.
697
698
Lattice (mathematics) 12
6 4 3 2
1 Fig. 4. The lattice of divisors of 12, under divisibility.
Fig. 5. The boolean algebra of order 8.
Such graphs often give useful information very simply. For example, let a finite lattice be called semimodular if any two elements a and b immediately above (covering) a given element c are also immediately under (covered by) another element d = a ∪ b. This condition can easily be tested by inspection. Dedekind showed that a finite lattice L was modular if and only if it and its dual were both semimodular. For L to be distributive, the extra condition of containing no subgraph such as that of Fig. 2 is necessary and sufficient. Applications to algebra and geometry. Lattices, like groups and rings, can be defined as abstract algebras, that is, as systems of elements combined by universally defined operations. These operations may be unary, binary, or ternary. In any such abstract algebra A, define a subset S to be a subalgebra of A when the result of performing any operation of A on elements in S is again in S. Call an equivalence relation a ≡ b (mod θ) on A a congruence relation when, for any n-ary operation f of A, ai ≡ bi (mod θ) for i ≡ 1, . . . , n implies that f(a1, . . . , an ≡ f(b1, . . . , bn)
Fig. 6. Lattice of partitions of [a,b,c,d].
(mod θ). Then the subalgebras of A form one (complete) lattice, and the congruence relations form another. Such results, which are true of abstract algebras in general, are called theorems of universal algebra. Another such result is the theorem that any algebra can be decomposed into subdirectly irreducible algebras. From this it follows easily that any distributive lattice is isomorphic with a ring of sets. In groups, rings, and many other algebras, all congruence relations are permutable; it follows that the congruence relations form a modular lattice. This fact, combined with the existence of one-element subalgebras, permits the development of an extensive structure theory that includes a unique factorization theorem under appropriate finiteness conditions. Curiously, lattices themselves satisfy a unique factorization theorem, but for another reason. It is important to note that the lattice products LM, involved in the statement of the preceding unique factorization theorem, are one of the six basic operations of a general arithmetic of partially ordered sets, which contains the usual cardinal and ordinal arithmetic as a special case. Thus, the most general finite boolean algebra is just 2n, where 2 is the ordinal number two and n is a finite cardinal number. In any lattice with O, a “point” is an element that covers O. In most lattices arising in geometry, every element is a join of a finite number of points; semimodular lattices with this property are called geometric lattices. The lattice of all partitions of a finite set into nonoverlapping subsets is such a geometric lattice. Figure 6 depicts the geometric lattice of all partitions of four objects. Any geometric lattice is complemented; conversely, any complemented modular lattice in which all chains are finite is geometric. It was mentioned above that the (n − 1)dimensional projective geometry Pn−1(F) over a division ring F was always a complemented modular lattice. Many interesting applications of lattice theory to geometry are extensions of this observation. For instance, let P be any abstract projective geometry, defined by incidence relations. Then P is also a complemented modular lattice. Conversely, any finite-dimensional complemented modular lattice is a product of projective geometries and a boolean algebra. The preceding abstract combinatorial approach to projective geometries led to the construction in 1936, by J. von Neumann, of his continuousdimensional projective geometries (so-called continuous geometries). O. Frink, in 1946, developed a parallel theory of projective geometries of discretely infinite dimension. By analogy, affine geometries can also be regarded as lattices, and any affine geometry in which all chains are finite is a geometric lattice. Although important applications of geometric lattices to combinatorial problems have been made by G.-C. Rota and others since 1960, the theory of semimodular lattices in general is quite limited. A negative result is Dilworth’s theorem, which states that
Lattice (mathematics) every finite lattice is isomorphic with a sublattice of a finite semimodular lattice. Similarly, P. M. Whitman has shown that every lattice is isomorphic with a sublattice of the lattice of all partitions of some (infinite) class. See ABSTRACT ALGEBRA; COMBINATORIAL THEORY. Relation to set theory and logic. In set theory, one is frequently concerned with various special families of sets such as closed sets, open sets, measurable sets, and Borel sets. Some of these families form boolean algebras. But the closed sets (and, dually, the open sets) of a topological space X usually form just an uncomplemented distributive lattice L(X). In L(X), the complemented elements (which necessarily form a boolean algebra) are those which disconnect X. Both the theory of finite sets and the theory of finite distributive lattices are fairly trivial. Each such lattice L has a unique representation as the cardinal power L = 2P of the two-element boolean algebra 2, with a general finite partially ordered set P as exponent. Moreover, any finite-dimensional distributive lattice is finite. If L is an infinite-dimensional lattice, distributive or not, then one can define several topologies on L, which may or may not be equivalent. Most of these are suggested by a consideration of the special case that L is the rational number system. Adapting Dedekind cuts, as first shown in 1935 by H. M. MacNeille, one can extend any lattice (or partially ordered set) to a complete lattice, in which any subset A of elements xα has a least upper bound (join) ∨ xα and greatest lower bound ∧ xα. In such a complete lattice, one can define lim sup {xn} and lim inf {xx} for any sequence x1, x2, x3, . . . of elements. One can define a convergence topology in L by letting xn → x mean that lim sup {xn} = lim inf {xn} = x. Alternatively, one can define an interval topology by taking the closed intervals [a, b], each consisting of all x L with a < =x< = b, as subbasis of closed sets. The concept of convergence, in lattices and other spaces, can be extended by considering directed sets of indices α, this concept itself being latticetheoretic. To describe convergence in topological spaces not satisfying countability axioms, such directed sets are essential. Using the concepts of lattice completeness and lattice topology, one can give interesting characterizations of various lattices associated with topological spaces. Thus, this was done for the lattice of regular open sets by S. Ulam and G. Birkhoff. It was done for the (distributive) lattice of continuous functions by Irving Kaplansky. Finally, it was done for closed (and open) sets by J. C. C. McKinsey and Alfred Tarski, all since 1935. An interesting analogy between set theory and logic is provided by the equivalence between the algebra of open sets and brouwerian logic, which arose entirely from a consideration of formal logic. Abstractly, both deal with relatively pseudocomplemented distributive lattices. Another type of logic, directly suggested by quantum theory, was developed in 1936 by G. Birkhoff and von Neumann. In this logic, properties form
an orthocomplemented modular lattice. The aristotelian logic of classical mechanics appears as the special, distributive (hence boolean) case of permutable observations. The formal properties of the probability p[x] of an event x are shown to be similar to those of dimension in projective geometry. Lattice theory also gives perspective on other kinds of formal logic, including modal logic and “strict implication.” However, since the deepest problems of formal logic concern the foundations of set theory, these applications will not be discussed here. Lattice-ordered groups. The first study of latticeordered groups was made in 1897 by Dedekind, in analyzing the properties of the rational number system R. Letting a < = b signify that ma = b for some integer m (that is, that b is a multiple of a), the nonzero rational numbers form a lattice in which a ∩ b and a ∪ b again mean the lcm and gcd of a and b. Also, with respect to ordinary multiplication, the nonzero rationals form a (commutative) group. Finally, in R, implications (4a) hold; that is, every group translaa b implies ac bc
(4a)
f (x) g(x) implies f (x) + c(x) g(x) + c(x) (4b) > λg > 0 imply λf = > g and λ = f =
(4c)
tion is a lattice automorphism. Any lattice L which is also a group, and which shares this property, is called a lattice-ordered group, or l-group. Though many noncommutative l-groups exist, it is a striking fact that every complete l-group is necessarily commutative. Lattice-ordered groups arise in function theory, as well as in number theory. If f < = g means that f(x) < g(x) for all (or almost all) x, then most func= tion spaces of real functions form lattices. They are also commutative groups under addition. Moreover, implication (4b) holds for any c(x); this is simply (4a) in additive notation. Hence most (real) function spaces are l-groups. In addition, they are vector spaces, in which implication (4c) holds. The l-groups with these additional properties are vector lattices. Although E. H. Moore and F. Riesz had discussed related ideas earlier, the first systematic analysis of vector lattices as such was made in 1937 by L. V. Kantorovitch. The application of vector lattice concepts to function theory is still not very far advanced. Using the intrinsic lattice topologies defined earlier, and others related to them, one can avoid the necessity of introducing a distance function, in many function spaces. Thus, the notion of metric boundedness is equivalent to order boundedness for functionals on any Banach lattice, and metric convergence is equivalent to relative uniform star-convergence, which is a purely lattice-theoretic concept. An additive l-group which is also a ring is called a lattice-ordered ring or l-ring when its multiplication satisfies the partial analog of implication (4b) shown
699
700
Lattice vibrations in formula (5). Such rings have been studied sysIf
> 0 and g = > 0, then fg ≥ 0 f =
(5)
tematically only since 1955; a typical theorem about them is the following: An l-ring is a product of simply ordered rings if and only if it satisfies implication (6). > 0 imply ca ∩ b = ac ∩ b = 0 a ∩ b = 0 and c = (6) Further applications. It is clear that the concepts of vector lattice and of l-ring are essential in various physical applications. This was first apparent in connection with the ergodic theorem, as proved in 1931 by Birkhoff and von Neumann, for the deterministic processes of classical mechanics. A generalization of this theorem to stochastic processes, whose natural formulation is based on the concept of a vector lattice, was proved in 1939–1941 by Shizuo Kakutani and Kosaku Yosida. A second application is to the theory of Reynolds operators, or averaging operators, arising in turbulent fluid motions. The essential connection with the order relation is simply the obvious principle that any average of nonnegative quantities is nonnegative. Using this principle and the theory of l-rings, one can decompose (subdirectly) any vector-averaging operator into scalar components. A third application is to the concept of criticality in nuclear reactor theory. Neutron chain reactions involve the birth (through fission), migration, and death (through absorption) of neutrons. The laws governing the evolution of the statistical distributions of neutrons (as functions of position x, velocity υ, and time t) evidently carry nonnegative distributions into nonnegative distributions. To deduce the mathematical principle that the neutron distribution must satisfy the asymptotic relation N(x,υ,t) ∼ eλtN0(x,υ), λ−1 being called the reactor “period,” it is again most convenient to reformulate the problem in lattice-theoretic language. One can then apply results of Oskar Perron, G. Frobenius, and R. Jentzsch on positive linear operators to prove the desired result. See LOGIC; RING THEORY. Garrett Birkhoff Bibliography. M. Anderson and T. Feil (eds.), Lattice-Ordered Groups: An Introduction, 1988; G. Birkhoff, Lattice Theory, 3d ed., 1967, reprint 1984; S. D. Cower (ed.), Universal Algebra and Lattice Theory, 1985; B. A. Davey and H. A. Priestley, Introduction to Lattices and Order, 2d ed., 2001; R. S. Freese and O. C. Garcia (eds.), Universal Algebra and Lattice Theory, 1983; G. Gierz et al., A Compendium of Continuous Lattices, 1980; G. Gratzer, General Lattice Theory, 2d ed., 1998; A. Hoffman, Continuous Lattices and Their Applications, 1985; S. Mac Lane and G. Birkhoff, Algebra, 3d ed., 1987.
Lattice vibrations The oscillations of atoms in a solid about their equilibrium positions. In a crystal, these positions form a regular lattice. Because the atoms are bound not
to their average positions but to the neighboring atoms, vibrations of neighbors are not independent of each other. In a regular lattice with harmonic forces between atoms, the normal modes of vibrations are lattice waves. These are progressive waves, and at low frequencies they are the elastic waves in the corresponding anisotropic continuum. The spectrum of lattice waves ranges from these low frequencies to frequencies of the order of 1013 Hz, and sometimes even higher. The wavelengths at these highest frequencies are of the order of interatomic spacings. See CRYSTAL STRUCTURE; VIBRATION; WAVE MOTION. At room temperature and above, most of the thermal energy resides in the waves of highest frequency. Because of the short wavelength, the motion of neighboring atoms is essentially uncorrelated, so that for many purposes the vibrations can be regarded as those of independently vibrating atoms, each moving about its average position in three dimensions with average vibrational energy of 3kT, where k is the Boltzmann constant and T the absolute temperature. The wave character of the vibrations is needed, however, to describe heat transport by lattice waves. Also, lattice vibrations interact with free electrons in a conducting solid and give rise to electrical resistance. The temperature variation at low temperatures provides evidence that this interaction is with waves. See ELECTRICAL RESISTIVITY. Lattice waves. In a discrete lattice the elastic equations of a continuum are replaced by a set of linear equations in the displacements u(x) of atoms at position x and their accelerations. In a uniform continuum the displacements can be expressed as a superposition of progressive waves, which are defined by their wave vectors, angular frequencies, and polarizations. There are three mutually perpendicular directions of polarization, in general not simply related to the direction of the wave vector. Also, the dependence of the frequency of the wave on the wave vector is linear for a given direction and polarization; the wave frequency is equal to the product of the wave vector and the phase velocity of the wave, which is determined by the elastic moduli and the density of the continuum, and depends on the direction of the wave vector but not its magnitude. See PHASE VELOCITY. In a discrete lattice with perfect periodicity, the normal modes are still progressive waves, and at low frequency they are the same as in the corresponding anisotropic continuum. However, the displacements of atoms are defined only at discrete lattice points, and when the wavelength is no longer large compared to the lattice spacing, the dependence of the wave frequency on the wave vector for a given direction and polarization is no longer linear. This is because the square of the wave frequency can be expressed as a sum of contributions from the force constants of linkages between an atom and its different neighbors, and each contribution contains a factor which varies as the square of the wave vector only when the wave vector is small. There are still three mutually perpendicular polarization directions for
Lattice vibrations every value of the wave vector, but these directions now rotate as the magnitude of the wave vector is changed. This adds complexity to the problem of calculating the frequencies, and numerical methods are required. The conditions that the lattice be stable and the undisplaced atoms be in positions of equilibrium require that the square of the wave frequency be real and positive for every mode. This implies that for every mode there is a mode of equal frequency and opposite velocity, even in lattices without inversion symmetry. If this were not so, the second law of thermodynamics could be circumvented. See THERMODYNAMIC PRINCIPLES. The dispersion relation, that is, the dependence of the frequency of the wave on the wave vector, determines the density of lattice modes per unit volume and frequency interval. Zone structure. Like a stroboscopic effect, but in space rather than time, the addition of a reciprocal lattice vector to the wave vector of a particular wave yields the same displacement pattern of the discrete lattice as that of the original wave. Thus, all the lattice waves can be identified with wave vectors in a fundamental zone, bounded by the perpendicular bisector planes of the lowest reciprocal lattice vectors. This fundamental zone contains as many wave-vector values as there are unit cells in the crystal. All real crystals contain at least two atoms per unit cell, and some more. Correspondingly there are, for each wave vector, not only the three polarization modes of the acoustic branches, for which the frequency goes to zero as the wave vector goes to zero (that is, as the wavelength becomes very long), but also the optical branches, which have nonvanishing frequencies everywhere in wave-vector space. When the wave vector equals 0, they describe the relative motion of atoms within a unit cell. In polar crystals they interact strongly with radiation, which accounts for their name. See BRILLOUIN ZONE; CRYSTAL ABSORPTION SPECTRA; CRYSTALLOGRAPHY. Localized vibrations. Dispersion curves of optical modes are frequently flat, that is, the frequency does not vary much over the zone. It thus appears, particularly in Raman spectra, that these modes of vibration have a single frequency, only slightly broadened by interactions with neighboring cells, as if the corresponding vibrations were localized within each cell. In principle, all modes of a harmonic lattice should be wavelike. However, in real crystals there are interactions between modes due to anharmonicities and defects; since these processes can be strong at high frequencies, a model of localized vibrations of atoms relative to each other in a unit cell may sometimes provide a better model. The acoustic modes are always wavelike in crystals. In amorphous solids and highly disordered crystals, they are wavelike at low frequencies but may be localized at the highest frequencies. Quantization and phonons. A normal mode is like a harmonic oscillator, and according to quantum mechanics its energy is quantized in integral units of hf, where h is the Planck constant. These quanta
are termed phonons in progressive lattice or elastic waves. In addition, each mode has a minimum or zero-point energy, hf/2; the sum over all normal modes of the zero-point energy is part of the formation energy of the crystal, and not regarded as thermal energy. See HARMONIC OSCILLATOR; NONRELATIVISTIC QUANTUM THEORY; PHONON; QUANTUM MECHANICS. Neutron diffraction. Neutrons are particles, and can also be considered as waves. In a crystal, a neutron can undergo Bragg diffraction, that is, elastic scattering, changing its wave vector by a reciprocal lattice vector, b. In addition, there are weaker inelastic scattering processes, involving absorption or emission of a phonon of frequency f and the wave vector, such that the energy of the neutron increases or decreases by hf and the wave vector of the neutron increases or decreases by the sum of the phonon wave vector and a reciprocal lattice vector. Velocity selection of neutrons scattered in a fixed direction allows their energy and momentum to be determined. Hence, it is possible to obtain the frequency of an interacting phonon of a given wave vector. Through a series of such measurements, the dispersion relation can then be obtained over the entire zone, and the forces between atoms and the density of modes, g(f), can be determined. See NEUTRON DIFFRACTION; SLOW NEUTRON SPECTROSCOPY. Heat transport and wave interactions. Each mode contains thermal energy, and carries a heat current equal to the product of its average energy in thermal equilibrium and the group velocity of the wave. A temperature gradient perturbs the thermal equilibrium, increasing the number of phonons flowing down the temperature gradient. Various interaction processes tend to restore equilibrium, resulting in a steady state with a heat current proportional to the temperature gradient. The constant of proportionality, called the thermal conductivity, involves the contribution to the specific heat per unit volume from waves at various frequencies, the phonon mean free paths or wave attenuation lengths, and the phonon velocities. This relationship applies to isotropic solids and can be generalized for anisotropy. See GROUP VELOCITY. Processes which limit the phonon mean free path are anharmonic interactions, scattering by lattice defects, and scattering by external and internal boundaries. The most important anharmonic interactions are three-phonon processes, in which a phonon is removed from a wave and a phonon is created in each of two other waves, or vice versa. Each such process must satisfy energy conservation as well as a wave-vector interference condition, so that, in the first case, for example, the frequency of the absorbed phonon is the sum of the frequencies of the two created phonons, and the wave vector of the absorbed phonons is the sum of the wave vectors of the two created phonons and a reciprocal lattice vector (b), which may be 0. Three-phonon processes cause the mean free path to decrease with increasing temperature. Scattering of lattice waves by defects increases with increasing frequency (f); its variation depends
701
702
Launch complex on the nature of the defect. Scattering by external and internal boundaries is almost independent of frequency, thus dominating at low frequencies and hence at low temperatures. A study of the thermal conductivity of nonmetallic crystals as function of temperature yields information about the defects present, and about the anharmonic nature of the interatomic forces in the crystal lattice. See CRYSTAL DEFECTS; THERMAL CONDUCTION IN SOLIDS. Paul G. Klemens Bibliography. N. W. Ashcroft and N. D. Mermin, Solid State Physics, 1976; R. Berman, Thermal Conduction in Solids, 1976; H. Bilz and W. Kress, Phonon Dispersion Relations in Insulators, 1979; G. Grimvall, Thermophysical Properties of Materials, 1986, revised 1999; C. Kittel, Introduction to Solid State Physics, 8th ed., 2005.
Launch complex The composite of facilities and support equipment needed to assemble, check out, and launch a rocketpropelled vehicle. The term usually is applied to the facilities and equipment required to launch larger vehicles for which a substantial amount of prelaunch preparation is needed. Small operational rockets may require similar but highly simplified resources on a much smaller scale. For these, the term launcher is usually used. See SPACE FLIGHT.
launch vehicle
Prelaunch processing. A rocket vehicle consists of one or more stages and a payload. These elements usually are manufactured at different locations and shipped separately to the launch site. The assembly process consists of properly mating these elements in the launch configuration, assuring that all mechanical and electrical interconnections are properly made. Components that have been shipped separately from the main vehicle elements also are installed. These include items requiring special handling or safety precautions, such as batteries and ordnance devices. See GUIDED MISSILE; ROCKET STAGING; SPACECRAFT STRUCTURE. The checkout process consists of detailed testing of all elements of the vehicle to assure they are functioning properly and ready for launch. Checkout of individual components or subsystems may begin prior to or during assembly when easy access for repair or replacement is available. After completion of assembly, the vehicle is given a detailed overall test, usually called combined systems test or integrated systems test, to verify launch readiness. The launch process includes the final countdown and lift-off. During countdown, propellants are loaded, all vehicle systems are activated and given a final readiness check, ordnance devices are armed, and first-stage motors are ignited. The term facilities applies to the larger permanent and usually fixed structures on the complex.
medium-pressure gas farm
umbilical tower burn pond
PAD A
PAD B
fuel storage tanks
fuel storage tanks central lox storage tank
service structure air conditioning unit
GSE storage area
cable trench
cable trench blockhouse
high-pressure gas farm operations support buildings
parking area
Fig. 1. Overall perspective of a typical launch complex in which one blockhouse serves two launch pads.
Launch complex The support equipment, often called ground support equipment (GSE), includes all ground equipment necessary to transport, service, test, control, and launch the vehicle. In this grouping are handling equipment, transporters, the service tower and erecting mechanisms, assembly and launch test equipment and consoles, weighing mechanisms, umbilical accessories, control and instrumentation cables, and other external equipment tailored to the missile and required for its preparation and operation. Launch-complex elements. A typical launch complex (Fig. 1) for a vehicle of the Thor-Delta class has two launch pads serviced by a single blockhouse and by common support equipment, providing some economy. The spacing between pads, blockhouse, and other elements of the complex is based on safety considerations. The overall size of a complex therefore depends primarily on the quantities and types of propellants used. Pad. The launch pad itself is usually a massive concrete structure designed to withstand the heat and pressure of the rocket exhaust. The pad provides physical support for the vehicle prior to launch, but it also has a number of other functions. A propellant-transfer room contains equipment for remote control of propellant loading. An environmental control system supplies conditioned air to the vehicle to maintain normal operating temperatures. A terminal-connection room provides for data and remote-control links between the vehicle and the launch control center or blockhouse. A flame deflector, located underneath the vehicle, diverts the rocket exhaust in a direction that does not cause harm. A high-pressure, high-volume water supply may be used to cool the flame deflector and otherwise minimize damage to the pad. The water supply also provides fire protection. Many pads have a hold-down-and-release mechanism that holds the vehicle in place during ignition and releases it when full thrust is built up (Fig. 2). The hold-down feature also can be used for test firings of the rocket motors. The weighing system is a particularly important part of on-stand equipment; it can be used for weighing the missile, determining the amount of propellants loaded, measuring the thrust in static tests, and establishing acceptable thrust performance prior to launching of a missile. Weighing mechanisms are generally load-cell strain gages. Umbilical connections. During the checkout and countdown phases until just prior to lift-off, the vehicle is serviced from the ground by electrical and mechanical connections. The electrical connections provide electrical power, control signals for remote operation, and data links. The mechanical connections provide for propellant loading, high-pressure gas transfer, and air conditioning. These connections usually are supported by an umbilical tower located on the pad adjacent to the vehicle. Attachment to the vehicle is made by quick-disconnect devices that are pulled away just after rocket motor ignition but prior to lift-off.
703
Fig. 2. Launch pad with Soyuz spacecraft and rocket. Gantry and hold-down mechanisms fold back at proper moments in launch sequence. (Tass, from Sovfoto)
Service structure. During assembly and checkout, technicians require access to all parts of the vehicle. This is provided by a service structure, or gantry, which can be moved up to the vehicle when needed and pulled away prior to launch. The service
Fig. 3. Minuteman ICBM on the launch pad with its service structure in place. (Boeing Aircraft)
704
Launch complex
Fig. 4. Upper part of a Titan 3 service structure showing the enclosure which provides shelter for the payload. (TRW Systems)
structure has movable platforms to provide access to all levels of the vehicle (Fig. 3). It may also have means for partially or totally enclosing the vehicle to provide a clean, air-conditioned environment. Figure 4 shows an environmental enclosure near the top of a Titan 3 service structure intended for sheltering only the payload. Figure 5 shows the inside of this structure with the hole through which the vehicle will project covered with a safety net. When the vehicle is assembled on the launch pad, the service structure also provides such necessary
Fig. 5. Inside view of the structure of the enclosure shown in Fig. 4. Movable platforms at upper left are temporarily folded away. (TRW Systems)
mechanical equipment as cranes and hoists, as well as electrical power, gas supplies, and other utilities needed for the process. Fueling systems. The launch complex for liquidfueled vehicles has fuel and oxidizer storage and loading systems. Cryogenic (low-temperature) propellants such as liquid oxygen and liquid hydrogen contribute greatly to the complexity of a launch complex. The cryogens must be stored close enough to the point of use so they will not vaporize in the transfer lines but far enough back from the launch point so that the takeoff blast will not endanger the storage tanks. See PROPELLANT; ROCKET PROPULSION. Extreme cleanliness is necessary in handling liquid oxygen, the most common of oxidizers. Tanks and lines must be thoroughly clean because, under proper conditions, hydrocarbons may combine with an oxidizer to cause a fire explosion. Filters are employed in the system to trap particles which may have been accidentally introduced. Liquid oxygen has a normal temperature of −297◦F (−183◦C); to minimize losses, the cryogenic tanks are double-jacketed, with a vacuum in the annular space. The entire system is usually made of stainless steel for cleanliness and strength at low temperatures. Liquid hydrogen, used as a fuel in some vehicles, is even more difficult to handle. It not only has a much lower temperature (−423.2◦F or −252.9◦C), but hydrogen systems are highly susceptible to leaks. When mixed with air, hydrogen is extremely explosive. Noncryogenic propellants, such as hydrazine and fuming nitric acid, are used in some vehicles. These are sometimes called storable propellants because they do not boil off at normal ambient temperatures as do cryogens. However, they pose other handling problems because they ignite on contact with each other and are extremely toxic, and fuming nitric acid is highly corrosive. Fixed fuel storage and pumping areas are ordinarily located on the opposite side of the launch stand from the oxidizer area for safety. For many fueling facilities, the more dangerous and corrosive propellants are handled and transferred by means of specially designed trailers. Even liquid oxygen may be transported in field trailers with their own pumping and pressurization equipment and introduced into the launch vehicle through a loading manifold. Both fuel and oxidizer systems are usually operated remotely from the blockhouse. A method of dumping propellants is usually provided. Liquid oxygen may be dumped into a concrete evaporation pond, sometimes called a burn pond, on the launch complex. Fuel is rarely dumped on the ground; it is either fed back into its storage tank or piped to a fuel-holding pond where it can be disposed of later. More elaborate closed storage is required for highly toxic or otherwise dangerous propellants. Control center. The launch operation is controlled from a launch control center. The older control centers, used for uncrewed space missions, are often contained in a blockhouse on the launch complex, a few hundred meters from the launch pad, as in Fig. 1.
Launch complex
Fig. 6. External view of typical blockhouse for uncrewed missions. Massive concrete construction protects workers during launch.
The blockhouse itself is a massive concrete structure (Fig. 6) designed to protect personnel from a possible inadvertent impact of the vehicle on the blockhouse with consequent explosion or fire. Thickness of the walls, sometimes 5–7 ft (1.5–2 m) of concrete, is designed to withstand the force of the highestorder explosion that the vehicle propellants could produce. Other precautions, such as baffled underground escape hatches and tunnels and remote air intakes for air conditioning, may be provided. The launch control center contains both control and monitoring systems. The control systems usually are consoles or panels that enable the operator to remotely operate and control specific functions on the launch pad and in the vehicle. For example, the propellant-transfer console enables the operator to load propellants from the storage tanks to the vehicle. Within the propellant system and the vehicle are sensors that measure flow rates, temperatures, pressures, and other parameters critical to the propellant-loading process. These measurements are transmitted via cable back to the control console, where they are displayed. The operator controls the process and also monitors it to assure satisfactory performance. The launch control center used for crewed space shuttle missions is located 5 mi (8 km) from the launch pads, far outside the blast area, and therefore does not require a blockhouse. This facility contains four firing rooms, each equipped with consoles which contain controls and displays required for checkout and launch (Fig. 7). All four firing rooms are operated by NASA, but two are used to support dedicated Department of Defense missions. The many controls and instruments on the consoles and panels connect to a terminal room on the floor below. Connections are made there to the cables which lead to a similar terminal room at the pad, where in turn connections are made to the vehicle. Closed-circuit television allows direct viewing of many critical operations. Because many people participate in a launch operation, the communications system that links them together is important to operational success. An operations intercom system may have a dozen or more channels, each channel having a specific assignment. For example, channel 1 may be assigned to stage 1 propulsion, channel 2 to guidance, channel 3 to instrumentation, channel 4 to pad safety, and so on.
This scheme of assignment allows each subsystem crew to communicate with each other. The subsystem crew chief also has a link with the system test conductor, for example, with the stage 1 test conductor. The stage 1 test conductor has an additional channel to the chief test conductor, who has overall launch responsibility. The chief test conductor also has direct links to important functions outside the launch complex, such as range control, range safety, and weather forecasting. Checkout computers. The use of digital computers is important in checkout and launch operations. A typical comprehensive test or launch countdown may require many hundreds of test sequences, each switched on in proper sequence. In addition, many hundreds or even thousands of data measurements
Fig. 7. Typical firing room in the launch control center used for crewed space shuttle missions. (NASA)
Fig. 8. Space shuttle Vehicle Assembly Building on Complex 39 at the Kennedy Space Center. The launch control center is at lower left. The space shuttle is on the mobile launch platform in front of the building. (NASA)
705
706
Launch complex must be made and evaluated to verify the test results. If this were done manually by human operators, the total time required would be unreasonably long. Computers can control test sequences and data readouts many times faster and with less chance for error. Computers can also promptly evaluate whether a measurement is within a specified tolerance and, if an out-of-tolerance measurement occurs, notify the test conductor. Without the speed and accuracy that computers provide, the checkout and launch of a complex space vehicle, such as a crewed space shuttle, would be virtually impossible. Many rocket vehicle components are deliberately designed for a short life but high performance to achieve weight savings. Such components could wear out faster than they could be checked out in a manual operation. See COMPUTER; DIGITAL COMPUTER. Mobile launcher. Many launch complexes provide for assembly of the rocket vehicle on the launch pad, using the service structure as the assembly facility. Large, complex vehicles such as a space shuttle may require an assembly time of many weeks or months.
Fig. 9. Fully assembled space shuttle on the mobile launcher leaving the Vehicle Assembly Building. (NASA)
For them, the launch site is inadequate because the service structure does not usually provide adequate shelter from weather, salt-air corrosion, and other environmental factors; for safety, service shops, stockrooms, engineering offices, and other needed services cannot be located near the launch pad; and the relatively long vehicle preparation time makes the pad unavailable for launchings, thus limiting launch schedules. To overcome these problems, the mobile-launch concept has been used as the basis for launchcomplex design on several programs (Titan 3, ApolloSaturn 5, and the space shuttle are notable examples). In this procedure, the vehicle is assembled and checked out in specially designed facilities that provide shelter as well as other facilities and support equipment needed for the process. Because highly volatile liquid propellants are not loaded in the assembly building, several vehicles can be safely processed at once, allowing much support equipment to be shared. Usually the vehicle is assembled directly on a transporter which, after the vehicle is assembled and checked out, carries the vehicle to the launch pad and then serves as a launch platform. The space shuttle Vehicle Assembly Building at the Kennedy Space Center (Fig. 8) is 525 ft (160 m) high, 716 ft (213 m) long, and 518 ft (158 m) wide, making it one of the largest (in volume) buildings in the world. A low-bay area is used for overhaul and maintenance of the space shuttle’s main engine and serves as a holding area for solid rocket forward assemblies and aft skirts. Solid rocket booster segments and assemblies are processed in adjacent facilities and transferred to the Vehicle Assembly Building for mating with the rest of the complete shuttle vehicle. A high-bay area has four bays, two on each side of the building. Two of the bays are used to store the shuttle’s large external fuel tank. The other two bays are used to stack and integrate the complete space shuttle vehicle. Retractable platforms in each bay provide direct access to the vehicle at all work levels. Diagonal to the vehicle assembly building is the launch control center. Two launch pads are located about 4 mi (6 km) distant. The platform which supports the space shuttle on its mobile launcher (Fig. 9) is 160 ft (49 m) long, 135 ft (41 m) wide, and 25 ft (8 m) deep. Two levels within the mobile launcher contain checkout equipment, including a checkout computer, propellantloading accessories, electrical equipment, and hydraulic servicing units. The mobile launcher serves as the basic support structure for the vehicle during the entire assembly checkout and launch process. The structure, with vehicle aboard, is moved the 4 mi (6 km) from the Vehicle Assembly Building to the launch pad by means of a huge tractorlike device which moves under the mobile launcher, lifts it, and carries it along a specially built roadway. Space shuttle. Space shuttle operations require two facilities not usually associated with the typical launch complex. The orbiter landing facility is a 3-mi-long (5-km), 300-ft-wide (90-m) runway used for landing operations. The orbiter processing facility
Launch complex
Fig. 10. Orbiter processing facility, used to refurbish orbiter vehicles after return from space and to load and unload heavy cargoes. (NASA)
(Fig. 10) is used to offload pyrotechnic devices, ordnance items, and toxic fuels, and to refurbish and prepare orbiter vehicles for relaunch after they return from space. The facility is also used to load and unload major payload elements. When vehicle refurbishment and payload operations are completed, the orbiter vehicle is transferred from the processing facility to the assembly building, where it is integrated with the other flight elements, that is, the solid rocket motors and the external fuel tank. See SPACE SHUTTLE. Solid-propellant complex. Solid-propellant rocket vehicles require, in general, a launch complex similar to those just described for liquid-propellant vehicles. The most important difference is the absence of facilities and equipment needed for propellant handling and transfer. This greatly simplifies design of the complex. However, solid-propellant stages are a severe fire or explosion hazard at all times during prelaunch operations. Extreme precautions are taken to eliminate the possibility of accidentally igniting the solid propellants by static electricity, sparks from electrical equipment or machinery, or lightning. Work areas and storage areas are usually separated by revetments. Operational systems. The launch complex for an operational ballistic missile weapon has essentially the same features as the complexes described above. An operational system is one which is deployed and ready for use by the armed forces. The major differences in the launch complexes are therefore concerned with the operational requirement to maintain a large number of missiles in a launch-ready condition at all times, to have a high probability of successful launches in a short time, and to have a high probability of striking designated targets. Facilities and equipment for operational systems are designed to have extremely high reliability. Much of the instrumentation used in research-anddevelopment launches is eliminated, and only that which is essential to monitoring launch readiness is retained. A single launch control center may be used
for operational control of a large number of missiles, with launch sites for the individual missiles well dispersed around the complex. For weapon systems, solid propellants and storable propellants are much favored over cryogens, which are difficult to store and handle and which cannot be loaded into the vehicle until just prior to launch, greatly increasing the reaction time of the system. Large weapon-system launch sites may be hardened; that is, they are designed to withstand all but a direct hit from a nuclear weapon in the event of an enemy attack. Hardening usually consists of locating the entire complex underground. The launch pad is a deep hole, called a silo, which contains the vehicle (Fig. 11). Tunnels provide limited access for
Fig. 11. Minuteman ICBM emplaced in silo.
707
708
Laurales umbilicals and at key work levels. As Fig. 11 shows, a technician can be lowered from the surface for additional access, although, on an operational system, access requirements would be minimal. Perhaps the most unique type of operational launch complex is the Poseidon and Trident-type of submarine. These nuclear-powered vessels can launch 16 Poseidon and 24 Trident intermediaterange ballistic missiles with nuclear warheads while submerged. The relatively short, squat solidpropellant missiles are mounted vertically in injection tubes in a compartment of the submarine. At launch, compressed air ejects the missile to the surface, where the rocket motor is ignited. Complete checkout gear and targeting computers are contained in the submarine. Successful use of the submarine-launched missiles depends on the ability to obtain an accurate geodetic location of the submarine relative to its target. This can be accomplished to a high degree of accuracy anywhere in the world by means of navigation satellites. See NAVAL ARMAMENT; SATELLITE NAVIGATION SYSTEMS. Commercial launch services. With the development of the Space Transportation System, anchored by the space shuttle, NASA began a gradual phaseout of its expendable launch vehicle fleet, consisting of the Atlas-Centaur and the Delta. The last of these vehicles under NASA control were launched in 1989. Under an agreement with the Department of Defense, NASA returned control of Launch Complex 17 (Delta) and Launch Complex 36 (Atlas-Centaur) on Cape Canaveral to the U.S. Air Force. The manufacturers of the Delta and the Atlas-Centaur have agreements with the Air Force for use of these complexes for commercial launch activities involving these vehicles. A similar agreement is in effect between the maker of the Titan vehicle and the Air Force for the use of Complex 40 on Cape Canaveral for commercial launch operations. NASA contracts with the commercial launch organizations for payloads that are not suitable for the space shuttle and that do not require crew-tending during launch, deployment, or orbital operations. Karl E. Kristofferson Bibliography. Kennedy Space Center Space Transportation System Ground Operations Plan, KSTSM-09, Rev. 4, 1979; Launch Site Accommodations Handbook for STS Payloads, K-SYSM14.1, Rev. D, 1989; Moonport, SP-4204, 1978; National Aeronautics and Space Administration: Space Transportation System Facilities and Operations, Kennedy Space Center, Florida, K-STSM-01, App. A, Rev. A, April 1984; National Space Transportation System Reference, vol. 1: Systems and Facilities, vol. 2: Operations, 1988.
Laurales An order of flowering plants composed of seven eumagnoliid families of tropical tree species that are important ecologically; some are shrubs. They include in total about 2500 species. They are most
closely related to Magnoliales, from which they differ in their partly inferior ovaries and their biaperturate or inaperturate pollen, and then to Winterales and Piperales. Lauraceae (the laurel or cinnamon family) are the best known and largest, but Monimiaceae and its segregates (Atherospermataceae and Siparunaceae) are also important. Nearly all species have aromatic oils, which are important spices, perfumes, and medicines; their flowers are for the most part small and often arranged in distinct whorls, but some such as those of Calycanthaceae are large and much like those of Magnoliales in that parts are arranged spirally and intergrade. Many species are important as timbers. Cinnamon and camphor come from Cinnamomum species, sassafras tea was formerly made from the roots of Sassafras albida (now discouraged due to its suspected carcinogenic nature), and avocado comes from Persea americana; several genera are cultivated as ornamentals, such as Calycanthus (Carolina allspice) and Chimonanthus (wintersweet; both Calycanthaceae); and Laurus (bay laurel) and Lindera (spice bush; both Lauraceae). See AVOCADO; CAMPHOR TREE; EUMAGNOLIIDS; MAGNOLIALES; MONOCOTYLEDONS; PIPERALES; SASSAFRAS. Mark Chase
Lava Molten rock material that is erupted by volcanoes through openings (volcanic vents) in the Earth’s surface. Volcanic rock is formed by the cooling and solidification of erupted lava. Beneath the Earth’s surface, molten rock material is called magma. All magmas and lavas consist mainly of a liquid, along with much smaller amounts of solid and gaseous matter. The liquid is molten rock that contains some dissolved gases or gas bubbles; the solids are suspended crystals of minerals or incorporated fragments of preexisting rock. Rapid cooling (quenching) of this liquid upon eruption forms a natural volcanic glass, whereas slower cooling allows more minerals to crystallize from the liquid and preexisting minerals to grow in size. The dissolved gases, a large proportion of which are lost on eruption, are mostly water vapor, together with lesser amounts of carbon, sulfur, chlorine, and fluorine gases. With very rare exception, the chemical composition of the liquid part of magmas and lavas is dominated by silicon and oxygen, which form polymers or compounds with other common rock-forming elements, such as aluminum, iron, magnesium, calcium, sodium, potassium, and titanium. Properties. Viscosity is the principal property which determines the form of erupted lava. It is mainly dependent on chemical composition, temperature, gas content, and the amount of crystals in the magma. Liquid lava with basaltic composition (such as in Hawaii)—relatively low in silicon and aluminum and high in iron, magnesium, and calcium— has higher fluidity (lower viscosity) compared with lava of rhyolitic or dacitic composition (such as at
Lawn and turf grasses Mount St. Helens, Washington), with higher abundance of silicon and aluminum but lower amounts of iron, magnesium, and calcium. High temperature and gas content of the liquid lava, combined with low crystal abundance, also contribute to increased lava fluidity. Measured maximum temperatures of basaltic lava (1150–1200◦C; 2100–2190◦F) are higher than those for andesitic and more silicic lavas (720–850◦C; 1330–1560◦F). Very fluid basaltic lavas can flow great distances, tens to hundreds of kilometers, from the eruptive vents; in contrast, more silicic lavas travel much shorter distances, forming stubby flows or piling up around the vent to form lava domes. See VISCOSITY. Products. Volcanic products formed by erupted lava vary greatly in size and appearance, depending on volcano type, lava composition, and eruptive style. Most lava products are either lava flows, formed during nonexplosive eruptions by cooling and hardening of flowing lava; or fragmental (pyroclastic) products, formed during explosive eruptions by the shredding apart and ejection into the air of liquid lava. There are three commonly recognized types of lava flows: pahoehoe, aa, and block lava. Pahoehoe is the Hawaiian name given worldwide to solidified fluid lava whose surface is smooth, gently rolling, and wrinkled to a ropey appearance (see illus.); in contrast, aa (also a Hawaiian term) has a very rough, jumbled surface composed of jagged fragments and blocks. Block lava is made up of angular blocks, but bigger and less jagged than those in aa lava. Fragmental volcanic products differ greatly in size and form, depending on the type of lava ejected and the explosive force of the eruption. All such materials are called pyroclastic. If the explosive force is relatively weak, the lava fragments tend to be larger in size. In contrast, during violently explosive eruptions, the ejected lava is shattered into smaller fragments. However, fragment size in any given explosive eruption can range from room-size blocks to dust-size particles; the largest and heaviest fragments are found closest to the eruptive vent. A
Pahoehoe lava (in Kilauea, Hawaii) has a smooth and wavy surface formed by a thin layer of cooler lava at the surface. The surface layer is pushed into folds from the faster-moving, hot fluid lava beneath. (Photograph by J.D. Griggs, U.S. Geological Survey, June 15, 1989)
great variety of terms have been coined by geologists to describe pyroclastic rocks and deposits, depending on the size, shape, and other characteristics of the fragments. Whether erupted explosively or nonexplosively, basaltic lava generally solidifies to form dark-colored volcanic rocks, whereas more silicic lavas (for example, andesitic, dacitic, and rhyolitic) typically form rocks much lighter in color. See ANDESITE; DACITE; IGNEOUS ROCKS; MAGMA; PYROCLASTIC ROCKS; RHYOLITE; TUFF; VISCOSITY; VOLRobert I. Tilling CANIC GLASS; VOLCANO. Bibliography. R. W. Decker and B. B. Decker, Mountains of Fire, Cambridge University, New York, 1991; R. W. Decker and B. B. Decker, Volcanoes, 3d ed., W. H. Freeman, New York, 1998; P. Francis, Volcanoes: A Planetary Perspective, Oxford University Press, Oxford, 1993; G. A. Macdonald, Volcanoes: Prentice Hall, Englewood Cliffs, NJ, 1972; H. Sigurdsson et al. (eds.), Encyclopedia of Volcanoes, Academic Press, San Diego, 2000.
Lawn and turf grasses Grasses that were once almost exclusively for pasture and meadow and now furnish ground cover for lawns, sportsfields, industrial parks and roadsides. Acreage of such mowed turf in the United States is reckoned in the tens of millions; in many states the turfgrass industry ranks among the top two or three “agricultural” pursuits. The same species important for grazing are also important for lawns. Grass is well adapted to repeated defoliation (grazing or mowing) because its meristem or growing point is basal, and for the most part only foliage tip growth is sacrificed. Modern turfgrasses. Although much adventive (“natural”) grass is still maintained as lawn, the trend has been toward the breeding and selection of cultivars suited specifically for fine turf, not pasture or meadow. These modern turfgrasses, compared with their “common” antecedents, are typically more decumbent (therefore, retain more green foliage after mowing because of this low stature), are denser by virtue of abundant tillering (hence are more resistant to weed invasion and generally wear better), and are more resistant to pest attack (they are especially screened for disease tolerance). Other features of concern to the turfgrass breeder are seed yields (this correlates a bit negatively with attractiveness), rich color and good looks (fine texture is generally preferred), and ability to spread well (abundant production of rhizomes or stolons). Quickness of seed sprouting, early vigor of the seedling, reluctance to thatch, wear quality, compatibility in blends and mixtures, climatic hardiness, and many other traits are also considered. The production and marketing of improved turfgrass cultivars is not a static endeavor. In time, pests adapted to once-unaffected cultivar evolve, so new selections are constantly sought. Fortunately, the reservoir of turfgrass breeding lines is not as likely
709
710
Lawn and turf grasses to be exhausted as that of improved agricultural crops, for a diversified germ-plasm bank exists in the many common turfs and the “hedgerows” of suburbia. The parade of new cultivars can be expected to continue, with older varieties dropping out (often because of uneconomical seed yields) and being replaced by still better selections. As an example, representative northern lawn grass cultivars (Lawn Institute Variety Review Board acceptances) are given below: 1. Kentucky bluegrass (Poa pratensis); a bit slow from seed, but strong sod, easily cared for. Adelphi America Arboretum Birka Bonnieblue Eclipse Emmundi
Fylking Glade Majestic Merion Merit Monopoly Nugget
Plush Ram I Sydsport Touchdown Vantage
2. Perennial ryegrass (Lolium perenne): quick starting, as elegant but not so widely adapted as bluegrass; most are polycrosses of several parental bloodlines. Blazer Citation Derby Diplomat
Fiesta Manhattan NK-200 Omega
Pennant Pennfine Regal Yorktown II
3. Fine fescue (Festuca rubra, in variety); good for shade, drought and low fertility, but often
Key: northern lawn grasses northern lawn grasses only with irrigation southern lawn grasses southern lawn grasses only with irrigation Major lawn grass zones of the United States.
patchy under warm humidity. Agram Banne Ensylva
Highlight Koket
Banne Waldorf
4. Bentgrass (Agrostis spp.) and rough bluegrass (Poa trivialis): grows in moist shade; mostly for frequently mowed, elegant turfs such as a golf green. Highland and Exeter colonial bent Emerald and Prominent creeping bents Sabre rough bluegrass 5. Tall fescue (Festuca arundinacea) “turf-type” polycrosses for difficult border states’ “transition zone” climate. Clemfine, Falcon, and Rebel Distribution. The illustration indicates the general distribution of the major turfgrasses in the United States. The northern species are listed in the table; southern ones are primarily Bermuda grass (Cynodon) and manilagrass (Zoysia) for the upper South; centipede (Eremochloa), St. Augustine (Stenotaphrum), and bahia (Paspalum) for the deeper South. Tall fescue (Festuca arundinacea) is often planted in the “transition zone” where the South meets the North; it is coarse and hardly elegant, but survives well where finer types do not. As one would suppose, the northern species are at their best during cooler portions of the year. They are especially favored by planting and fertilization in
Lawrencium autumn. Southern species are of a subtropical nature and grow best during the spring and summer. Turfgrass management. Management practices include mowing, timely fertilization, perhaps irrigation (certainly in arid climates), and probably some measure of pest control. Mowing will vary with the kind of grass: reel mowers are most suitable for stoloniferous species that are mowed low (bentgrasses and most southern species); the more versatile and less complicated rotaries work better for taller turf. High mowing generally favors turfgrass, since more photosynthetic foliage is retained and the grass is more competitive against weeds. Mowing should be frequent enough so that no more than half of the green foliage is sacrificed at any one clipping. Fertilization has become costly, and self-reliant cultivars adapted to minimal maintenance are increasingly popular. Still, 2 lb of nitrogen per 1000 ft2 (1 kg/100 m2) annually (the experts would recommend at least twice this) should be provided for typical lawns and parceled out during seasons appropriate to growth patterns. Fescues and centipede tolerate low fertility well. Sources of nitrogen that feed out the nutrient gradually are now available in a number of forms, and are to be recommended. In most cases a complete fertilizer (containing phosphorus and potassium as well as nitrogen) would be called for, although for turf the chief and most abundant nutrient should be nitrogen. A soil test will show if phosphorus, potassium, or other cations are adequate. A simple pH test indicates the need for correcting acidity by liming (the humid East) or alkalinity with sulfur (the western plains). See FERTILIZER. Irrigation is essential for survival of traditional turfgrasses in many areas, but prairie species such as buffalograss (Buchloe¨e) and grama (Bouteloua) often survive without irrigation. Fortunately, the engineering of lawn irrigation devices has made tremendous strides, and automatic systems are available which apply water precisely according to set schedule. Soils vary in their ability to absorb and retain water, and any irrigation system should be adjusted to their capacities. Sandy soils accept insoak rapidly but may retain only an inch or less in the top foot of soil; they should be watered briefly but frequently. Heavy soils (clays) may accept insoak only slowly, but hold up to 3 in. (8 cm) of water in the pore system of the top foot of soil; they must be watered slowly but less frequently. There is often a tendency to overwater. Actually, letting the lawn dry out occasionally almost to the point of wilting can be therapeutic, helping to forestall hydrophilic weeds (such as nutsedge, Poa annua) and disease. See IRRIGATION (AGRICULTURE). Pest control. Weeds are perhaps the most evident pests of lawns. An excellent array of simple-touse herbicides is available that allows one to selectively control most broadleaf weeds (dicotyledons) in grass, annual grasses in grass, but not perennial grasses in turfgrass (unless all vegetation is killed by a general herbicide and the treated spots reseeded). See HERBICIDE.
Insect control is more of a problem, because of the upsetting influence of insecticides on the environment. Insect damage tends to be epidemic, from sod webworms, chinch bugs, soil grubs, and so on. Treatment with a registered insecticide, according to directions, can generally rectify the trouble. See INSECTICIDE. Disease inocula are ever present in a lawn, awaiting proper weather, vulnerable physiology of the grass, appropriate season, lawn stress, and so on. At least a low incidence of diseases such as leaf spot, caused by Helminthosporium spp., is almost inevitable in cool spring weather, whereas dollar spot, caused by Sclerotinia homoeocarpa, and Fusarium-induced diseases are generally manifest in hot weather. Innumerable other diseases can also attack turfgrasses. Most are more intense on heavily fertilized turf, but some (such as dollar spot) are less serious under generous fertilization. Turfgrass diseases are difficult for the amateur to recognize, and even the pathologist is often not sure of causality without undertaking laboratory study and reinoculation of the species. The concept of ecological “balance” among fungi, with serious disease erupting only when conditions get out of balance, is becoming more accepted. Thus Fusarium may be beneficial in the normal lawn microcosm, aiding in decomposition of thatch and holding other diseases in check, although capable of damaging a lawn when a susceptible host and the right conditions coincide. Professional grounds managers who must maintain immaculate turf generally undertake preventive fungicidal sprayings at times of year when disease is likely to erupt. Even then fungi may build resistance to a particular fungicide, so that alternating a conventional contact treatment (which kills spores and mycelia on the surfaces) with the newer systemic fungicides such as benomyl is often advocated. See FUNGISTAT AND FUNGICIDE. For a homeowner, it is probably too late to stop the inroads of a disease by the time that symptoms are recognized, and in most cases the disease will have to run its course or it may be terminated by changes in weather without any fungicidal treatment. In any event, a homeowner seldom has either the apparatus or the skill to apply a fungicide accurately and frequently enough so that preventive measures work. The most effective measure is to plant disease-resistant lawn grass cultivars. See GRASS CROPS; PLANT PATHOLOGY. Robert W. Schery Bibliography. J. B. Beard, Turfgrass Science and Culture, 1973; W. H. Daniel and R. P. Freeborg, Turf Manager’s Handbook, 1979; R. W. Schery, Lawn Keeping, 1976; A. J. Turgeon, Turfgrass Management, 5th ed., 1998.
Lawrencium A chemical element, symbol Lr, atomic number 103. Lawrencium, named after E. O. Lawrence, is the eleventh transuranium element; it completes the
711
Lawson criterion 1 1 H 3 Li 11 Na 19 K 37 Rb 55 Cs 87 Fr
2 4 Be 12 Mg 20 Ca 38 Sr 56 Ba 88 Ra
3 21 Sc 39 Y 71 Lu 103 Lr
4 22 Ti 40 Zr 72 Hf 104 Rf
lanthanide series actinide series
5 23 V 41 Nb 73 Ta 105 Db
6 24 Cr 42 Mo 74 W 106 Sg
7 25 Mn 43 Tc 75 Re 107 Bh
8 26 Fe 44 Ru 76 Os 108 Hs
9 27 Co 45 Rh 77 Ir 109 Mt
10 28 Ni 46 Pd 78 Pt 110 Ds
11 29 Cu 47 Ag 79 Au 111 Rg
12 30 Zn 48 Cd 80 Hg 112
13 5 B 13 Al 31 Ga 49 In 81 Tl 113
14 6 C 14 Si 32 Ge 50 Sn 82 Pb
15 16 7 8 N O 15 16 P S 33 34 As Se 51 52 Sb Te 83 84 Bi Po
57 58 59 60 61 62 63 64 65 La Ce Pr Nd Pm Sm Eu Gd Tb
66 67 Dy Ho
89 Ac
98 Cf
90 Th
91 Pa
92 93 94 95 96 97 U Np Pu Am Cm Bk
18 2 17 He 9 10 F Ne 17 18 Cl Ar 35 36 Br Kr 53 54 I Xe 85 86 At Rn
68 69 70 Er Tm Yb
99 100 101 102 Es Fm Md No
actinide series of elements. See ACTINIDE ELEMENTS; PERIODIC TABLE; TRANSURANIUM ELEMENTS. The nuclear properties of all the isotopes of lawrencium from mass 255 to mass 260 have been established. 260Lr is an alpha emitter with a half-life of 3 min and consequently is the longest-lived isotope known. Albert Ghiorso Bibliography. A. Ghiorso et al., New element: Lawrencium, atomic number 103, Phys. Rev. Lett., 6(9):473–475, 1961; S. Hofmann, On Beyond Uranium: Journey to the End of the Periodic Table, 2002; G. T. Seaborg and W. D. Loveland, The Elements Beyond Uranium, 1990.
Lawson criterion A necessary but not sufficient condition for the achievement of a net release of energy from nuclear fusion reactions in a fusion reactor. As originally formulated by J. D. Lawson, this condition simply stated that a minimum requirement for net energy release is that the fusion fuel charge must combust for at least enough time for the recovered fusion energy release to equal the sum of energy invested in heating that charge to fusion temperatures, plus other energy losses occurring during combustion. The result is usually stated in the form of a minimum value of nτ that must be achieved for energy break-even, where n is the fusion fuel particle density and τ is the confinement time. Lawson considered bremsstrahlung (x-ray) energy losses in his original definition. For many fusion reactor cases, this loss is small enough to be neglected compared to the heating energy. With this simplifying assumption, the basic equation from which the Lawson criterion is derived is obtained by balancing fusion energy release against heat input to the fuel plasma. Assuming hydrogenic isotopes, deuterium and tritium at densities nD and nT respectively, with accompanying electrons at density ne, all at a maxwellian temperature T, one obtains Eq. (1), where the recovered fusion 3 1 nD nT σ vQτ ηr ≥ kt(nD + nT + ne ) (1) 2 ηh energy release is set equal to or greater than the energy input to heat the fuel. Here σ υ is the product of reaction cross section and relative ion velocity, as
averaged over the velocity distribution of the ions, Q is the fusion energy release, ηr is the efficiency of recovery of the fusion energy, ηh is the heating efficiency, and k is the Boltzmann constant. For a fixed mixture of deuterium and tritium ions, Eq. (1) can be rearranged in the general form of Eq. (2). For a 50-50 mixture of deuterium and tri T nτ ≥ F(ηr , ηh , Q) (2)
σ υ tium (see illus.), the minimum value of T/ σ υ occurs at about 25 keV ion kinetic temperature (mean ion energies of about 38 keV). Depending on the assumed efficiencies of the heating and recovery processes, the lower limit values of nτ range typically between about 1014 and 1015 cm−3 s. These values serve as a handy index of progress toward fusion, although their achievement does not alone guarantee success. Under special circumstances (unequal ion and electron temperatures, unequal deuterium and tritium densities, and nonmaxwellian ion distributions), lower nτ values may be adequate for nominal break-even. The discussion up to this point has been oriented mainly to situations in which the fusion reactor may be thought of as a driven system, that is, one in which a continuous input of energy from outside the reaction chamber is required to maintain the reaction. Provided the efficiencies of the external heating and energy recovery systems are high, a driven reactor generally would require the lowest nτ values to produce net power. An important alternative operating mode for a reactor would be an ignition mode, that is, one in which, once the initial heating of the fuel charge is accomplished, energy directly deposited in the plasma by charged reaction products will thereafter sustain the reaction. For example, in the
1016
1015
nτ, cm −3 . s
712
1014
1013
1012 0.1
1.0 10 100 ion kinetic temperature, keV
Typical plot of minimum value of nτ necessary for net release of energy versus ion kinetic temperature in a mixture containing equal amounts of deuterium and tritium. (After J. M. Hollander, ed., Annu. Rev. Energy, 1:213–255, 1976)
Layered intrusion D-T reaction, approximately 20% of the total energy release is imparted to the alpha particle; in a magnetic confinement system, much of the kinetic energy carried by this charged nucleus may be directly deposited in the plasma, thereby heating it. Thus if the confinement time is adequate, the reaction may become self-sustaining without a further input of energy from external sources. Ignition, however, would generally require nτ products with a higher range of values, and is thus expected to be more difficult to achieve than the driven type of reaction. However, in all cases the Lawson criterion is to be thought of as only a rule of thumb for measuring fusion progress; detailed evaluation of all energy dissipative and energy recovery processes is required to properly evaluate any specific system. See NUCLEAR FUSION; NUCLEAR REACTOR; PLASMA (PHYSICS). Richard F. Post
Lawsonite A metamorphic silicate mineral related chemically and structurally to the epidote group of minerals. Its composition is Ca(H2O)Al2(OH)2[Si2O7], and it is − a sorosilicate based on the dimeric [Si2O7]6 radical, where two (SiO4) tetrahedra link at a common vertex to form a bow-tie arrangement. The water molecule is bonded directly to the calcium ion and is tightly held. Although the calcium (Ca2+) and the aluminum (Al3+) cations are octahedrally coordinated by anions (and the water molecule), and the silicon cation (Si4+) is tetrahedrally coordinated by the anions, the structure is not limitingly closest-packed. Lawsonite possesses two perfect cleavages; crystals are orthorhombic prismatic to tabular, and white to pale blue to colorless; specific gravity is 3.1, and hardness is 6.5 on Mohs scale. Lawsonite has symmetry space group Ccmm, a = 0.890, b = 0.576, c = 1.333 nanometers, and four formula units occur in the cell. See CRYSTAL STRUCTURE. Lawsonite possesses an interesting paragenesis (mineral sequence), occurring mainly along continental margins (subduction zones), and it is a good indicator mineral for high-pressure (6–12 kilobars or 600–1200 megapascals) and low-temperature (300– 450◦C or 570–840◦F) assemblages. It is associated with the glaucophane schists (or blueschists) of the Franciscan Formation in California, the Piedmont metamorphics of Italy, and schists in New Zealand, New Caledonia, and Japan. The premetamorphic assemblage is inferred to be largely pelagic and argillaceous (siliceous) sediments derived from basalt weathering and the like. See BLUESCHIST; GLAUCOPHANE; SCHIST. For metamorphism to lawsonite-glaucophane rocks, the precursor fluid phase must be relatively high in water. One reaction curve has a steep pressure slope with lower limit at about 350◦C (660◦F) and 6 kb (600 MPa) and an upper limit at about 450◦C (840◦F) and 12 kb (1200 MPa). See AMPHIBOLE; EPIDOTE; METAMORPHISM; SILICATE MINERALS. Paul B. Moore
Layered intrusion In geology, an igneous rock body of large dimensions, 5–300 mi (8–480 km) across and as much as 23,000 ft (7000 m) thick, within which distinct subhorizontal stratification is apparent and may be continuous over great distances, sometimes more than 60 mi (100 km). Although conspicuous layering may be found in other rocks of syenitic to granitic composition that are richer in silica, the great layered intrusions (complexes) of the world are, in an overall sense, of tholeiitic basaltic composition. (They may be viewed as intrusive analogs to continental flood basalts.) Indeed, their basaltic composition is of paramount significance to their origin. Only basaltic melts, originating in the mantle beneath the Earth’s crust, are both voluminous enough to occupy vast magma chambers and fluid enough for mineral layering to develop readily. The relatively low viscosity of basaltic melt is a consequence of its high temperature, 2100–2200◦F (1150–1200◦C), derived from the mantle source region, and its silica-poor, magnesium- and iron-rich (mafic) composition. See BASALT; EARTH; EARTH CRUST; MAGMA. Cumulus concept. Layered mafic complexes develop upon intrusion of large volumes of basaltic magma (120–24,000 mi3 or 500–100,000 km3) into more or less funnel-shaped (smaller intrusions) or dish-shaped (larger complexes) chambers 3–5 mi (5–8 km) beneath the Earth’s surface. Although the 1970s and 1980s produced an unusual amount of debate about the development of igneous mineral layering, it is still widely held that such layering is dominantly produced by gravitational settling of early-formed (cumulus) crystals. These crystals begin to grow as the magma cools and, on reaching a critical size, begin to sink because of their greater density relative to that of the hot silicate melt. Although the sequential order of mineral crystallization can vary depending on subtle differences in magma chemistry, a classic sequence of crystallization from basaltic magma is olivine, (Mg,Fe)SiO4; orthopyroxene, (Mg,Fe)SiO3; clinopyroxene, (Ca,Mg,Fe)SiO3; plagioclase (Ca,Na)(Si,Al)4O8 [Fig. 1]. The oxide minerals that are rich in chromium, iron, and titanium, namely chromite, magnetite, and ilmenite, are also common cumulus minerals. Of these silicate and oxide minerals, only plagioclase is light in color. Hence, on various scales, the relative proportion of plagioclase between and within layers is particularly effective in creating the layered aspect of many layered intrusions (for example, Fig. 2). Although layers containing only one cumulus mineral may form under special circumstances, coprecipitation of two or three cumulus minerals—for example, olivine + orthopyroxene; orthopyroxene + plagioclase; or orthopyroxene + clinopyroxene + plagioclase—is more common. Under the influence of gravity and current movements in the cooling, tabular magma chamber, the cumulus minerals accumulate on the ever-rising floor of the chamber. Later recognized as well-formed to rounded grains (Fig. 3), these cumulus minerals touch only at occasional
713
Layered intrusion plagioclase (and/or clinopyroxene)
percent cumulus minerals
plagioclase (and /or clinopyroxene)
olivine
0
50
0
uartz e or q tridym it
ene pyrox
100
orthopyroxene cumulate
C cyclic unit
olivine cumulate
ortho
e
C A
100
orthopyroxene
columnar section
B
50
orthopyroxene cumulate
orthopyroxeneolivine cumulate
orthopyroxene
plagioclase-orthopyroxene cumulate
olivin
714
B
orthopyroxene (solid solution) olivine
olivine cumulate
SiO2
orthopyroxene-olivine cumulate
Key: path of liquid during fractional crystallization (a)
"predicted rocks" (fractionated crystals plus 20–30% trapped liquid)
olivine A
(b)
Fig. 1. Sequence of crystallization from basaltic magma. (a) Hypothetical phase diagram showing the derivation of the sequence olivine cumulate, orthopyroxene-olivine cumulate, orthopyroxene cumulate, as observed in cyclic units near the base of many layered mafic complexes. The triangular diagram is divided by cotectic lines (in color) into four fields in which melt compositions can be represented. For melt compositions within each field, only the single mineral for which the field is named will crystallize as the melt cools. Thus, because of withdrawal of the constituents of the crystallizing mineral, the melt changes in composition and migrates to a cotectic line. On reaching a cotectic line, the two minerals for which adjacent fields are named will crystallize together as the magma temperature falls (following the arrowheads on the cotectic lines). Plagioclase-orthopyroxene cumulate consists of plagioclase-clinopyroxene-orthopyroxene or plagioclase-clinopyroxene. (b) Hypothetical cyclic unit as derived from a liquid that moved from A to B to C. (After T. N. Irvine, Crystallization sequences in the Muskox intrusion, in D. J. L. Visser and G. von Gruenewaldt, eds., Symposium on the Bushveld Igneous Complex and Other Layered Intrusions, pp. 441–476, 1969)
points and are immersed in residual, interstitial melt. As cooling proceeds, the interstitial melt typically solidifies as a mosaic of poorly shaped (anhedral) interlocking grains of one or more minerals that form
Fig. 2. A sequence of layers of near-constant thickness and aspect that show a rather abrupt transition from a base rich in pyroxene (pyroxene-plagioclase cumulate) to a top composed nearly entirely of plagioclase (plagioclase cumulate). Ice axe is approximately 3 ft (1 m) long; LaPerouse layered gabbro, Fairweather Range, Alaska.
later in the overall sequence of crystallization (Fig. 3). In other cases, particularly when pyroxene and plagioclase are cumulus phases, interstitial melt also may crystallize as overgrowths on cumulus minerals. The solid rock thus formed is known as a cumulate, a term that emphasizes its mode of origin and predominant content of cumulus minerals. Cumulate layering. Lithologic layering within a layered mafic intrusion is typically displayed on a variety of scales. On the broadest scale, such an intrusion may contain ultramafic cumulates rich in olivine and orthopyroxene at its base; mafic pyroxeneand plagioclase-rich cumulates at intermediate levels; and more evolved plagioclase-rich cumulates, or even granitic (granophyric) rocks, near its top. In addition to the relatively conspicuous changes in mineral occurrence and relative abundance, there is generally also a so-called cryptic evolution in mineral chemistry. Early-formed olivine and pyroxene are rich in magnesium (Mg), but become richer in iron (Fe) as successively younger cumulates are deposited; similarly, plagioclase evolves from compositions richer in calcium (Ca) to compositions richer in sodium (Na). Within this broad framework, relatively
Layered intrusion
4 mm
ol px
pl
Fig. 3. Photomicrograph of an olivine cumulate from the LaPerouse gabbro. Early-crystallized olivine grains (ol; now fractured and incipiently altered) are enclosed in interstitial plagioclase (pl) and orthopyroxene (px). Mineral proportions suggest that the rock could represent crystallization of a melt with a bulk composition similar to that represented by point A in Fig. 1.
homogeneous cumulate layers range in thickness from 1000 to 2000 ft (300 to 600 m) down to less than 1 in. (2.54 cm). Commonly, layering is rhythmic or repetitious (for example, Fig. 2). The cyclic unit modeled in Fig. 1 may be repeated 15 times in the Ultramafic series of the Stillwater Complex, Montana, where each unit ranges from 80 to 600 ft (24 to 180 m) in stratigraphic thickness. Mechanics of layering. If a tabular body of basaltic magma were to deposit a cumulate sequence without disturbance, this sequence would not be expected to display many of the observed abrupt changes in cumulus mineral assemblage and chemistry. Thus, commonly observed aberrations in lithologic layering reveal that unusual processes led to their formation. Indeed, many details of these processes have remained speculative. At least three processes are thought to be commonplace: (1) convective overturn of magma within the chamber; (2) reinjection of new batches of similar magma from the mantle source region; and (3) injection of a chemically distinct, second magma type. In the strictest sense, the first process should produce layered rocks in which mineral proportions and compositions evolve, albeit in a stepwise manner; the second should cause repetition of the initial cumulus assemblage and mineral compositions; while the third should produce such aberrations as monomineralic cumulates, for example, of plagioclase or chromite. One or another of these processes may characterize small layered intrusions, whereas all may have operated to produce the cumulate succession of large complexes. Additional evidence of active current movements within magma chambers are features analogous with sedimentary processes, such as size grading within cumulate layers, crossbedding, and occasional scour-and-fill structures. Notable mafic complexes. Geologists have identified several dozen layered mafic complexes, rang-
715
ing in age from 2700 million years (Stillwater Complex) to 25 million years (LaPerouse, Alaska). By virtue of its enormous size (underlying 25,000 mi2 or 65,000 km2) and fabulous mineral wealth, the Bushveld Complex, South Africa, is probably the world’s best-known layered intrusion. Quite similar in many respects are the Dufek intrusion of Antarctica (mostly underlying more than 19,000 mi2 or 49,000 km2 of glacial ice; Fig. 4) and the Stillwater Complex. The Stillwater Complex has been tilted 60◦ by subsequent tectonic movements, and the 16,000ft (5000-m) cumulate sequence is now revealed in a 30-mi-long (48-km) cross section. The Great Dyke of Zimbabwe (3.5 by 330 mi or 5.6 by 530 km), the Muskox intrusion of northern Canada, and several complexes in Western Australia are also impressively large. Relatively small complexes such as the Skaergaard intrusion of East Greenland (4.5 by 7 mi or 7 by 11 km), the Rhum intrusion of Scotland, and the Kiglapait intrusion of Labrador (17 by 20 mi or 27 by 32 km) have been studied extensively and have contributed much to the understanding of cumulate processes. The mafic intrusions at Sudbury, Canada, and Duluth, Minnesota, do not qualify as layered complexes in the strictest sense, because of a general absence of well-defined layering. Economic considerations. Study of layered mafic complexes is of far more than academic interest because many of them host important deposits of chromium (Cr), copper (Cu), nickel (Ni), titanium (Ti), vanadium (V), and the platinum-group elements [platinum (Pt), palladium (Pd), iridium (Ir), osmium (Os), and rhodium (Rh)]. Each of these elements
Fig. 4. Repetitive darker layers of pyroxene cumulates within a thick sequence of relatively monotonous orthopyroxene-plagioclase cumulates, Dufek intrusion, Antartica. Total thickness of cumulates shown is 1750 ft (530 m). (Art Ford, U.S. Geological Survey)
716
Layout drawing has a relatively high initial concentration in mafic magmas, but all must be dramatically concentrated within restricted layers to be recoverable economically. The Bushveld Complex of South Africa has been a dominant source of the world’s Cr, Pt, Pd, and V since the 1930s. Significant deposits of Cr in layered complexes are readily recognized because they occur as black, dense, nearly monomineralic cumulate layers of the mineral chromite, (Cr,Fe)3O4. Such chromite layers may be as thick as 6 ft (2 m) in the Bushveld Complex and have been mapped continuously over distances of tens of miles. Similar, less thick, and continuous layers have been mined from other complexes, including the lower part of the Stillwater Complex, where the chromite is, however, of significantly less value because the proportion of Cr to Fe is substantially lower. The origin and discovery of platinum-groupelement concentrations in layered mafic complexes is more problematic. Indeed, despite intimate knowledge of the Merensky Reef, principal host to the platinoids in the Bushveld Complex, years of intense prospecting preceded discovery of a remarkably similar horizon, the J-M Reef, in the Stillwater Complex. Much of the difficulty experienced in locating the Pt-Pd-enriched horizon in the Stillwater Complex is a reflection of the fact that the Merensky and J-M reefs cannot be readily understood within the framework of the cumulate theory that explains virtually every other feature of these two complexes. It appears that an influx of new magma of undetermined composition, as well as concentration of volatile constituents (such as water and chlorine), trace elements, and platinum-group elements from vast amounts of mafic magma, is essential to formation of a deposit from which Pt and Pd can be recovered profitably. In the Bushveld and Stillwater complexes, the platinum-group elements occur as alloys and in combination with sulfur (S), tellurium (Te), bismuth (Bi), selenium (Se), and other rare elements as minor components of a locally irregular but laterally persistent layer only a few feet thick. See IGNEOUS ROCKS. Gerald K. Czamanske Bibliography. G. K. Czamanske and M. L. Zientek (eds.), The Stillwater Complex, Montana: Geology and Guide, 1985; I. Parsons (ed.), Origins of Igneous Layering, 2d ed., 1990; D. J. L. Visser and G. von Gruenewaldt (eds.), Symposium on the Bushveld Igneous Complex and Other Layered Intrusions, 1969; L. R. Wager and G. M. Brown, Layered Igneous Rocks, 1967.
Layout drawing A design drawing or graphical statement of the overall form of a component or device, which is usually prepared during the innovative stages of a design. Since it lacks detail and completeness, a layout drawing provides a faithful explanation of the device and its construction only to individuals such as designers and drafters who have been intimately involved in the conceptual stage. In a sense, the layout draw-
ing is a running record of ideas and problems posed as the design evolves. In the layout drawing, for instance, considerations of kinematic design of a mechanical component are explored graphically in incomplete detail, showing only those aspects of the elements and their interrelationships to be considered in the design. In most cases the layout drawing ultimately becomes the primary source of information from which detail drawings and assembly drawings are prepared by other drafters under the guidance of the designer. See DRAFTING; ENGINEERING DRAWING. Robert W. Mann
Lazurite The chief mineral constituent in the ornamental stone lapis lazuli. It crystallizes in the isometric system, but well-formed crystals, usually dodecahedral, are rare. Most commonly, it is granular or in compact masses. There is imperfect dodecahedral cleavage. The hardness is 5–5.5 on Mohs scale, and the specific gravity is 2.4–2.5. There is vitreous luster and the color is a deep azure, more rarely a greenish-blue. Lazurite is a tectosilicate, the composition of which is expressed by the formula Na4Al3Si3O12S, but some S may be replaced by SO4 or Cl. Lazurite is soluble in HCl with the evolution of hydrogen sulfide. Lazurite is a feldspathoid but, unlike the other members of that group, is not found in igneous rocks. It occurs exclusively in crystalline limestones as a contact metamorphic mineral. Lapis lazuli is a mixture of lazurite with other silicates and calcite and usually contains disseminated pyrite. Lapis lazuli has long been valued as an ornamental material. Lazurite was formerly used as a blue pigment, ultramarine, in oil painting. Localities of occurrence are in Afghanistan; Lake Baikal, Siberia; Chile; and San Bernardino County, California. See FELDSPATHOID; SILICATE MINERALS. Cornelius S. Hurlbut, Jr.
Le Chatelier’s principle A description of the response of a system in equilibrium to a change in one of the variables determining the equilibrium. Le Chatelier’s principle is often stated as the following: For a system in equilibrium, a change in a variable shifts the equilibrium in the direction that opposes the change in that variable. However, this statement is false, since the shift induced by a change in a variable is sometimes in the direction that augments (rather than opposes) the change in the variable. Instead of a single general statement, the response of a system in equilibrium to a change in a variable is best described by a set of particular statements. The following statements are all valid consequences of the laws of thermodynamics. See CHEMICAL THERMODYNAMICS. For any chemical reaction equilibrium or phase equilibrium, an increase in temperature at constant pressure shifts the equilibrium in the direction in which heat is absorbed by the system. For example,
Leaching the gas-phase reaction N2 + 3H2 → 2NH3 is exothermic (heat-producing), so an increase in temperature at constant pressure will shift the equilibrium to the left to produce more N2 and H2. See CHEMICAL EQUILIBRIUM; PHASE EQUILIBRIUM. For any reaction equilibrium or phase equilibrium, an increase in pressure at constant temperature shifts the equilibrium in the direction in which the volume of the system decreases. For example, the gas-phase reaction N2 + 3H2 2NH3 has fewer moles of gas on the right than on the left, so the products occupy a smaller volume than the reactants, and a pressure increase at constant temperature will shift the equilibrium to the right to produce more NH3. For a reaction equilibrium in a dilute solution, addition of a small amount of a solute species that participates in the reaction will shift the equilibrium in the direction that uses up some of the added solute. For an ideal-gas reaction equilibrium, addition at constant temperature and volume of a species that participates in the reaction shifts the equilibrium in the direction that consumes some of the added species. For an ideal-gas reaction equilibrium, addition at constant temperature and pressure of a species that participates in the reaction might shift the equilibrium to produce more of the added species or might shift the equilibrium to use up some of the added species; the direction of the shift depends on the reaction, on which species is added, and on the initial composition of the equilibrium mixture. For example, for the gas-phase reaction N2 + 3H2 2NH3, addition of a small amount of N2 at constant temperature and pressure will shift the equilibrium to the left to produce more N2 if the initial mole fraction of N2 is greater than 0.5, and will shift the equilibrium to the right to consume some N2 if the initial N2 mole fraction is less than 0.5; addition of H2 or NH3 at constant temperature and pressure will always shift the equilibrium to consume some of the added species. General conditions determining which way an ideal-gas equilibrium will shift under constant-temperatureand-pressure addition of a species can be derived by using thermodynamics. Ira N. Levine Bibliography. G. W. Castellan, Physical Chemistry, 3d ed., 1983; J. de Heer, Le Chatelier, scientific principle, or “sacred cow,” J. Chem. Educ., 35:133, 135– 136, 1958; J. de Heer, The principle of Le Chatelier and Braun, J. Chem. Educ., 34:375–380, 1957; I. N. Levine, Physical Chemistry, 4th ed., 1994; R. S. Treptow, Le Chatelier’s principle, J. Chem. Educ., 57:417–420, 1980.
miscible. The soluble constituent may be solid (as the metal leached from ore) or liquid (as the oil leached from soybeans). Leaching is closely related to solvent extraction, in which a soluble substance is dissolved from one liquid by a second liquid immiscible with the first. Both leaching and solvent extraction are often called extraction. Because of its variety of applications and its importance to several ancient industries, leaching is known by a number of other names; solid-liquid extraction, lixiviation, percolation, infusion, washing, and decantation-settling. The liquid used to leach away the soluble material (the solute) is termed the solvent. The resulting solution is called the extract or sometimes the miscella. The mechanism of leaching may involve simple physical solution, or dissolution made possible by chemical reaction. The rate of transportation of solvent into the mass to be leached, or of soluble fraction into the solvent, or of extract solution out of the insoluble material, or some combination of these rates may be significant. A membranous resistance may be involved. A chemical reaction rate may also affect the rate of leaching. The general complication of this simple-appearing process results in design by chiefly empirical methods. Whatever the mechanism, however, it is clear that the leaching process is favored by increased surface per unit volume of solids to be leached and by decreased radial distances that must be traversed within the solids, both of which are favored by decreased particle size. Fine solids, on the other hand, cause mechanical operating problems during leaching, slow filtration and drying rates, and possible poor quality of solid product. The basis for an optimum particle size is established by these characteristics. Leaching processes fall into two principal classes: pure solvent
wetflake hopper
half miscella
dry flakes
paddle conveyor
Leaching The removal of a soluble fraction, in the form of a solution, from an insoluble, permeable solid with which it is associated. The separation usually involves selective dissolving, with or without diffusion, but in the extreme case of simple washing it consists merely of the displacement (with some mixing) of one interstitial liquid by another with which it is
full miscella Fig. 1. Bollman extractor. (After W. L. McCabe and J. C. Smith, Unit Operations of Chemical Engineering, 2d ed., McGraw-Hill, 1967)
717
718
Leaching feed extracted meal
solvent oil and solvent
Fig. 2. Hildebrandt screw-conveyor extractor. (After W. L.McCabe and J. C. Smith, Unit Operations of Chemical Engineering, 2d ed., McGraw-Hill, 1967)
solvent feed solids feed
solids flow liquid flow
spent solids
extract Fig. 3. Kennedy extractor. (After R. H. Perry and C. H. Chilton, eds., Chemical Engineer’s Handbook, 5th ed., McGraw-Hill, 1973)
those in which the leaching is accomplished by percolation (seeping of solvent through a bed of solids), and those in which particulate solids are dispersed into the extracting liquid and subsequently separated from it. In either case, the operation may be a batch process or continuous. See EXTRACTION; SOLVENT EXTRACTION. Percolation. In addition to being applied to ores and rock in place and by the simple technique of heap leaching, percolation is carried out in batch tanks and in several designs of continuous extractors. The batch percolator is a large circular or rectangular tank with a false bottom. The solids to be leached are dumped into the tank to a uniform depth. They are sprayed with solvent until their solute content is reduced to an economic minimum and are then excavated. A simple example is the brewing of coffee in a percolator (repeated extraction) or a drip pot (once-through). Countercurrent flow of the solvent through a series of tanks is common, with fresh solvent entering the tank containing the most nearly exhausted material. Some leach tanks operate under
pressure, to contain volatile solvents or increase the percolation rate. Continuous percolators employ the moving-bed principle, implemented by moving baskets that carry the solids past solvent sprays, belt or screw conveyors that move them through streams or showers of solvent, or rakes that transport them along a solventfilled trough. In a revolving-basket type like the Rotocel extractor, bottomless compartments move in a circular path over a stationary perforated annular disk. They are successively filled with solids, passed under solvent sprays connected by pumps so as to provide countercurrent flow of the extracting liquid, and emptied through a large opening in the disk. Alternating perforated-bottom extraction baskets may be arranged in a bucket-elevator configuration, as in the Bollman extractor (Fig. 1). On the up cycle, the partially extracted solids are percolated by fresh solvent sprayed at the top; on the down cycle, fresh solids are sprayed with the extract from the up cycle. A screw-conveyor extractor like the Hildebrandt extractor (Fig. 2) moves solids through a V-shaped line in a direction opposite to the flow of solution. The solids may be conveyed instead by rakes or paddles along a horizontal or inclined trough counter to the direction of solvent flow, as in the Kennedy extractor (Fig. 3). In the last two types, the action is predominantly percolation but involves some solids dispersal because of agitation by the conveyors. Horizontal continuous vacuum filters of the belt, tray, or table type sometimes are used as leaching equipment. See FILTRATION. Dispersed-solids leaching. Equipment for leaching fine solids by dispersion and separation, a particularly useful technique for solids which disintegrate during leaching, includes batch tanks and continuous extractors. Inasmuch as the purpose of the dispersion is usually only to permit exposure of the particles to unsaturated solvent, the agitation in a batch-stirred extractor need not be intense. Air agitation is often used. Examples are Pachuca tanks, large cylinders with conical bottoms and an axial air nozzle or airlift tube. If mechanical agitation is employed, a slow paddle is sufficient. The Dorr agitator (Fig. 4) combines a rake with an air lift. In all cases, the mixture of solids and liquid is stirred until the maximum economical degree of leaching has occurred. The solids
discharge launders
air lift and shaft rakes Fig. 4. Dorr agitator for batch washing of precipitates. (After W. L. Badger and J. T. Banchero, Introduction to Chemical Engineering, McGraw-Hill, 1955)
Lead 7th ed., 1997; J. R. Richardson and D. G. Peacock (eds.), Chemical Engineering, 1979; R. E. Treybal, Mass-Transfer Operations: Chemical Engineering, 3d ed., 1980.
solids feed
miscella
Lead A chemical element, Pb, atomic number 82 and atomic weight 207.19. Lead is a heavy metal (specific gravity 11.34 at 16◦C or 61◦F), of bluish color, which tarnishes to dull gray. It is pliable, inelastic, easily fusible, melts at 327.4◦C (621.3◦F), and boils at 1740◦C (3164◦F). The normal chemical valences are 2 and 4. It is relatively resistant to attack by sulfuric and hydrochloric acids but dissolves slowly in nitric acid. Lead is amphoteric, forming lead salts of acids as well as metal salts of plumbic acid. Lead forms many salts, oxides, and organometallic compounds. See PERIODIC TABLE.
solvent
extracted solids discharge Fig. 5. Bonotto extractor. (After R. H. Perry and C. H. Chilton, eds., Chemical Engineer’s Handbook, 5th ed., McGraw-Hill, 1973)
are then allowed to settle. The extract is decanted, and the solids, sometimes after successive treatments with fresh solvent, are removed by shoveling or flushing. Continous dispersed-solids leaching is accomplished in gravity sedimentation tanks or in vertical plate-extractors. An example of the latter is the Bonotto extractor shown in Fig. 5. Staggered openings in the plates allow the solids, moved around each plate by a wiping radial blade, to cascade downward from plate to plate through upward-flowing solvent. Gravity sedimentation thickeners can serve as effective continuous contacting and separating devices for leaching fine solids. A series of such units properly connected provides true continuous countercurrent washing (known as CCD for continuous countercurrent decantation) of the solids. See COUNTERCURRENT TRANSFER OPERATIONS; THICKENING. Shelby A. Miller Bibliography. C. J. King, Separation Processes, 2d ed., 1980; Kirk-Othmer Encyclopedia of Chemical Technology, 4th ed., 1994; W. L. McCabe, J. C. Smith, and P. Harriot, Unit Operations and Chemical Engineering, 6th ed., 2000; R. H. Perry and D. Green (eds.), Chemical Engineers’ Handbook,
1 1 H 3 Li 11 Na 19 K 37 Rb 55 Cs 87 Fr
2 4 Be 12 Mg 20 Ca 38 Sr 56 Ba 88 Ra
3 21 Sc 39 Y 71 Lu 103 Lr
4 22 Ti 40 Zr 72 Hf 104 Rf
lanthanide series actinide series
5 23 V 41 Nb 73 Ta 105 Db
6 24 Cr 42 Mo 74 W 106 Sg
7 25 Mn 43 Tc 75 Re 107 Bh
8 26 Fe 44 Ru 76 Os 108 Hs
9 27 Co 45 Rh 77 Ir 109 Mt
10 28 Ni 46 Pd 78 Pt 110 Ds
11 29 Cu 47 Ag 79 Au 111 Rg
12 30 Zn 48 Cd 80 Hg 112
13 5 B 13 Al 31 Ga 49 In 81 Tl 113
14 6 C 14 Si 32 Ge 50 Sn 82 Pb
15 16 7 8 N O 15 16 P S 33 34 As Se 51 52 Sb Te 83 84 Bi Po
57 58 59 60 61 62 63 64 65 La Ce Pr Nd Pm Sm Eu Gd Tb
66 67 Dy Ho
89 Ac
98 Cf
90 Th
91 Pa
92 93 94 95 96 97 U Np Pu Am Cm Bk
18 2 17 He 9 10 F Ne 17 18 Cl Ar 35 36 Br Kr 53 54 I Xe 85 86 At Rn
68 69 70 Er Tm Yb
99 100 101 102 Es Fm Md No
Industrially, the most important lead compounds are the lead oxides and tetraethyllead. Lead forms alloys with many metals and is generally employed in the form of alloys in most applications. Alloys formed with tin, copper, arsenic, antimony, bismuth, cadmium, and sodium are all of industrial importance. See LEAD ALLOYS. Lead compounds are toxic and have resulted in poisoning of workers from misuse and overexposure. However, lead poisoning is presently rare because of the industrial application of modern hygienic and engineering controls. The greatest hazard arises from the inhalation of vapor or dust. In the case of organolead compounds, absorption through the skin may become significant. Some of the symptoms of lead poisoning are headaches, dizziness, and insomnia. In acute cases there is usually stupor, which progresses to coma and terminates in death. The medical control of employees engaged in lead usage involves precise clinical tests of lead levels in blood and urine. With such control and the proper application of engineering control, industrial lead poisoning may be entirely prevented. Lead rarely occurs in its elemental state. The most common ore is the sulfide, galena. The other minerals of commercial importance are the carbonate,
719
720
Lead alloys cerussite, and the sulfate, anglesite, which are much more rare. Lead also occurs in various uranium and thorium minerals, arising directly from radioactive decay. Commercial lead ores may contain as little as 3% lead, but a lead content of about 10% is most common. The ores are concentrated to 40% or greater lead content before smelting. See LEAD METALLURGY. The largest single use of lead is for the manufacture of storage batteries. Other important applications are for the manufacture of cable covering, construction, pigments, and ammunition. Organolead compounds are being developed for applications such as catalysts for polyurethane foams, marine antifouling paint toxicants, biocidal agents against gram-positive bacteria, protection of wood against marine borers and fungal attack, preservatives for cotton against rot and mildew, molluscicidal agents, anthelmintic agents, wear-reducing agents in lubricants, and corrosion inhibitors for steel. Because of its excellent resistance to corrosion, lead finds extensive use in construction, particularly in the chemical industry. It is resistant to attack by many acids because it forms its own protective oxide coating. Because of this advantageous characteristic, lead is used widely in the manufacture and handling of sulfuric acid. Lead has long been used as protective shielding for x-ray machines. Because of the expanded applications of atomic energy, radiation-shielding applications of lead have become increasingly important. Lead sheathing for telephone and television cables continues to be a sizable outlet for lead. The unique ductility of lead makes it particularly suitable for this application because it can be extruded in a continuous sheath around the internal conductors. The use of lead in pigments has been a major outlet for lead but is decreasing in volume. White lead, 2PbCO3 · Pb(OH)2, is the most extensively used lead pigment. Other lead pigments of importance are basic lead sulfate and lead chromates. A considerable variety of lead compounds, such as silicates, carbonates, and salts of organic acids, are used as heat and light stabilizers for polyvinyl chloride plastics. Lead silicates are used for the manufacture of glass and ceramic frits, which are useful in introducing lead into glass and ceramic finishes. Lead azide, Pb(N3)2 is the standard detonator for explosives. Lead arsenates are used in large quantities as insecticides for crop protection. Litharge (lead oxide) is widely employed to improve the magnetic properties of barium ferrite ceramic magnets. Also, a calcined mixture of lead zirconate and lead titanate, known as PZT, is finding increasing markets as a piezoelectric material. Hymin Shapino; James D. Johnston Bibliography. American Society for Metals, Metals Handbook, 2d ed., 1999; P. W. Atkins et al., Inorganic Chemistry, 4th ed., 2006; J. Casas and J. Sordo (eds.), Lead: Chemistry, Analytical Aspects, Environmental Impact and Health Effects, 2006; F. A. Cotton et al., Advanced Inorganic Chemistry, 6th ed., Wiley-Interscience, 1999; Z. Rappoport
(ed.), The Chemistry of Organic Germanium, Tin and Lead Compounds, 2003.
Lead alloys Substances formed by the addition of one or more elements, usually metals, to lead. Lead alloys may exhibit greatly improved mechanical or chemical properties as compared to pure lead. The major alloying additions to lead are antimony and tin. The solubilities of most other elements in lead are small, but even fractional weight percent additions of some of these elements, notably copper and arsenic, can alter properties appreciably. Cable-sheathing alloys. Lead is used as a sheath over the electrical components to protect power and telephone cable from moisture. Alloys containing 1% antimony are used for telephone cable, and leadarsenical alloys, containing 0.15% arsenic, 0.1% tin, and 0.1% bismuth, for example, are used for power cable. Aluminum and plastic cable sheathing have replaced lead alloy sheathing in many applications. Battery-grid alloys. Lead alloy grids are used in the lead-acid storage battery (the type used in automobiles) to support the active material composing the plates. Lead grid alloys contain 6–12% antimony for strength, small amounts of tin to improve castability, and one or more other minor additions to retard dimensional change in service. No lead alloys capable of replacing the lead-antimony alloys in automobile batteries have been developed. An alloy containing 0.03% calcium for use in large stationary batteries has had success. Chemical-resistant alloys. Lead alloys are used extensively in many applications requiring resistance to water, atmosphere, or chemical corrosion. They are noted for their resistance to attack by sulfuric acid. Alloys most commonly used contain 0.06% copper, or 1–12% antimony, where greater strength is needed. The presence of antimony lowers corrosion resistance to some degree. Type metals. Type metals contain 21/2 –12% tin and 21/2 –25% antimony. Antimony increases hardness and reduces shrinkage during solidification. Tin improves fluidity and reproduction of detail. Both elements lower the melting temperature of the alloy. Common type metals melt at 460–475◦F (238– 246◦C). Bearing metals. Lead bearing metals (babbitt metals) contain 10–15% antimony, 5–10% tin, and for some applications, small amounts of arsenic or copper. Tin and antimony combine to form a compound which provides wear resistance. These alloys find frequent application in cast sleeve bearings, and are used extensively in freight-car journal bearings. In some cast bearing bronzes, the lead content may exceed 25%. See ANTIFRICTION BEARING. Solders. A large number of lead-base solder compositions have been developed. Most contain large amounts of tin with selected minor additions to provide specific benefits, such as improved wetting characteristics. See SOLDERING.
Lead isotopes (geochemistry) 2.0 concordia 0.3 206Pb/ 238U
Free-machining brasses, bronzes, steels. Lead is added in amounts from 1 to 25% to brasses and bronzes to improve machining characteristics. Lead remains as discrete particles in these alloys. It is also added to some construction steel products to increase machinability. Only about 0.1% is needed, but the tonnage involved is so large that this forms an important use for lead. See ALLOY; LEAD; LEAD METALLURGY; TIN ALLOYS. Dean N. Williams Bibliography. American Society for Testing and Materials, Nonferrous Metal Products, vol. 02.04, 1986; W. Hofmann, Lead and Lead Alloys: Properties and Technology, 1970; T. J. Bertone and J. Neely, Practical Metallurgy and Materials of Industry, 5th ed., 1999.
1.5
0.2
1.0
0.1
0
discordia
Key: = selected uraninites with concordant ages = selected zircon samples
0.5
0
1
2
3
4
5
6
7
207Pb/ 235U
Lead isotopes (geochemistry) The study of the isotopic composition of stable and radioactive lead in geological and environmental materials to determine their ages or origins. See LEAD.
Fig. 1. Systematics of Uranium-lead (U-Pb) dating. For a closed system containing uranium but no primary lead, the ratios of 206Pb/238U and 207Pb/235U will vary with the age of the sample, as shown by the concordia line. Ages indicated by marks along each line are in units of 1 billion years. The data for uraninites are consistent with this. For systems where episodic losses of lead have occurred in the past, values may lie along a discordia, as shown for zircons from the Little Belt Mountains of Montana.
Stable Isotopes Lead isotope geochemistry provides the principal method for determining the ages of old rocks and the Earth itself, as well as the sources of metals in mineral deposits and the evolution of the mantle. Geochronology. Lead (Pb) has four stable isotopes of mass 204, 206, 207, and 208. Three are produced by the radioactive decay of uranium (U) and thorium (Th) [reactions (1)–(3), where t1/2 is the half-life of 238
U (t 1/2 = 4.5 × 109 years) −→ 206
235
(1)
Pb + 7 + 4
(2)
Pb + 6 + 4
(3)
U (t1/2 = 0.71 × 10 years) −→ 207
232
Pb + 8 + 6
9
9
Th (t1/2 = 13.9 × 10 years) −→ 208
the isotope and α and β denote alpha and beta particles, respectively]. The lead produced by the decay of uranium and thorium is termed radiogenic. Since 204 Pb is not produced by the decay of any naturally occurring radionuclide, it can be used as a monitor of the amount of initial (nonradiogenic) lead in a system. This will include all of the 204Pb and variable amounts of 206Pb, 207Pb, and 208Pb. See ALPHA PARTICLES; BETA PARTICLES; RADIOACTIVITY; THORIUM; URANIUM. Closed systems. It is possible to calculate the isotopic composition of lead at any time t in the past by calculating and deducting the amount of radiogenic lead that will have accumulated, provided a mineral or rock represents a closed system. A closed system is one in which there has been no chemical transfer of uranium, thorium, or lead in or out of the mineral or rock since it formed. All calculations for uraniumlead dating should yield the same age; this is a unique and powerful property. The ratio of radiogenic 207Pb to 206Pb is simply a function of age, not the U/Pb
ratio. Certain minerals such as zircon, monazite, and uraninite are particularly well suited for dating because of extremely high concentrations of uranium or thorium relative to initial lead. However, the degree to which they behave as closed systems can vary. For samples having concordant U-Pb ages, data lie along a curved line called the concordia, the line defined by the daughter/parent ratios of each isotopic system that have equal ages (Fig. 1). There are several uranium-rich minerals that commonly yield concordant ages, the most useful being the rare-earth phosphate monazite, a common accessory mineral in crustal rocks. However, there are many minerals with appreciable radiogenic lead which have discordant ages, indicating they have not been a closed system throughout their history as a discrete phase. Zircon (ZrSiO4), a common accessory mineral in many types of crustal rocks, has been used more than any other phase for U-Pb dating. However, the data are normally discordant. Data for a series of zircons from the Little Belt Mountains, Montana, lie on a welldefined straight line that intersects the concordia at two points (Fig. 1). It has been shown that phases subject to lead loss (or uranium gain) during a period of time that is short compared with the age of the phase yield daughter/parent ratios defining a straight line termed a discordia. The lower intersection of the discordia with the concordia indicates the time of the episodic bulk lead loss, while the upper intersection represents the age of the phase. Discordance in zircons is more pronounced in uranium-rich varieties and is caused by the severe damage to the lattice produced by recoiling alpha particles. See MONAZITE; ROCK AGE DETERMINATION; ZIRCON. Isochron methods. Even if a rock or mineral contains appreciable initial lead, it may still be dated by using isochron methods. Since the amount of radiogenic lead relative to nonradiogenic lead is a function of the
721
Lead isotopes (geochemistry)
20
10 10
20
30
40
50
206Pb/204Pb
Fig. 2. Plot of 207Pb/204Pb versus 206Pb/204Pb for troilite primordial lead and selected stone meteorites. The slope of the primary isochron (Geochron) for modern lead indicates an age of 4.55 million years for these materials. The white rectangular area illustrates the range of variation in most terrestrial leads and corresponds approximately to the region detailed in Fig. 3.
U/Pb ratio and time, the slope on a plot of 206Pb/204Pb against 238U/204Pb is proportional to age. An isochron is a line on a graph defined by data for rocks of the same age with the same initial lead isotopic composition, the slope of which is proportional to the age. In practice, the 238U/204Pb ratio may well have been disturbed by recent alteration of the rock because uranium is highly mobile in near-surface environments. For this reason it is more common to combine the two uranium decay schemes and plot 207Pb/204Pb against 206Pb/204Pb; the slope of an isochron on this plot is a function of age. Isochron dating has been used to determine an age of 4.55 billion years for the Earth and the solar system by dating iron and stony meteorites (Fig. 2). The position of data along the isochron is a function of the U/Pb ratio. The iron meteorites are particularly important for defining the initial lead isotopic composition of the solar system since they contain negligible uranium. The meteorite isochron is commonly termed the Geochron. See EARTH, AGE OF; GEOCHRONOMETRY; METEORITE. Geochemistry of Earth. By using the position of data for typical continental crustal rocks and samples of basalt that are derived as magmas from the mantle as shown on the Geochron (Fig. 2), the indication is that the silicate Earth has a U/Pb ratio of about 0.1. This is high relative to chondritic meteorites, commonly considered the best representative of primitive unprocessed preplanetary solar system material. A significant fraction of the Earth’s total lead inventory could be in the metallic core. Also, lead is extremely volatile and may have been lost at the temperatures that inner solar system objects may have experienced in their accretionary history. The Earth’s mantle has been depleted by repeated melting during its 4.55-billion-year history, and the loss of such melts should leave the mantle with a low U/Pb ratio. However, close inspection reveals that the lead isotopic compositions of most mantlederived magmas plot to the right of the Geochron (Fig. 3), implying a higher U/Pb ratio since the Earth formed. Originally it was thought that this discrep-
ancy was caused by late accretion of the Earth or late core formation, either of which would displace the mantle to the right of the Geochron. However, there is independent isotopic evidence that the Earth did not accrete late, and there are theoretical reasons why the Earth’s core almost certainly formed very early. A more likely explanation is that the mantle has been modified throughout its history by the subduction of ocean-floor basalt enriched in uranium and depleted in lead by low-temperature seawater alteration. The basalt lavas of some ocean islands such as St. Helena have especially radiogenic lead, thought to reflect an extreme example of such reenrichment. See SUBDUCTION ZONES. Tracers. Lead isotopes can serve as tracers in the lithosphere, atmosphere, and hydrosphere. Lead isotopes are commonly used to trace the sources of constituents in continental terranes, granites, ore deposits, and pollutants. For example, the class of low-temperature hydrothermal lead-zinc (Pb-Zn) mineralization known as Mississippi Valley type ore deposits have extremely variable 206Pb/204Pb ratios in their galenas, ranging up to 100. These variations reflect the time-integrated U/Pb ratio of the source of the lead, and they can be used to identify specific geological units from which the lead was scavenged. Similarly, some granites such as those of the Isle of Skye in northwest Scotland have very unradiogenic lead, indicating that the magmas were derived by melting portions of the lower continental crust that were depleted in uranium about 3 billion years ago. See ORE AND MINERAL DEPOSITS. The industrialized countries of the world use large tonnages of lead annually, about one-third of which is widely distributed in the air, water, soil, and vegetation of the environment. Isotopic composition of lead in various environmental samples has identified sources and pathways of lead pollution. Most of the lead in the atmosphere originates from the combustion of gasoline containing alkyl lead antiknock compounds. The second-largest emission source of
15.9 Geochron = 4.55⫻109 years 207Pb/ 204Pb
Key: = troilite primordial lead = selected stone meteorites
30 207Pb/204Pb
722
15.7
intraplate ocean island basalts mid-ocean ridge basalts
15.5
15.3 17
average continental crust
18
19
20
21
22
206Pb/ 204Pb
Fig. 3. Lead isotopic compositions of most ocean-floor and ocean-island basalts plot to the right of the Geochron defined by meteorite data (Fig. 2). The composition is the opposite of that predicted from the effects of depletion of the Earth’s mantle by partial melting and suggests reenrichment by uranium-enriched subducted ocean floor.
Lead metallurgy atmospheric lead is coal combustion. Lead aerosols eventually fall to the ground as precipitation or as dust and accumulate in topsoil and in surface water, where they may be incorporated into terrestrial or aquatic life. Lead isotopes have been used to trace contaminant dispersion in the environment. Lead isotope studies, for example, have helped support the contention that high concentrations of lead near roadways are the result of local deposition of large aerosols from automobile exhaust. Similarly, the isotopic composition of lead in natural waters and in sediments has been useful in identifying the extent to which sources are anthropogenic. See AIR POLLUTION; WATER POLLUTION. Radioactive Isotopes While there are at least 11 known radioactive isotopes of lead, only 212Pb, 214Pb, and especially 210Pb have been of interest geochemically. The usefulness of these isotopes stems from the unique mechanism by which they are separated from parent isotopes in the uranium or thorium decay series. Atmosphere. Unlike their noble-gas parents, the radioactive lead isotopes as well as other daughter products have a strong affinity for atmospheric aerosols. On formation, the daughter products exist as small positive ions associated with polarized air or water molecules; they form light aggregate particles within periods of tens of seconds. Both 212Pb and 214 Pb have been used to study the process of diffusion of ions in gases and the mechanism of attachment of small ions to aerosols. Measurement of the distribution of radon (Rn) daughter product activities (212Pb and 214Pb) with respect to aerosol size has been important in the development of theoretical models of ion-aerosol interactions. The short half-lives of 212 Pb and 214Pb also make these isotopes suitable for studies of near-ground atmospheric transport processes. While the short-lived lead isotopes disappear from the atmosphere primarily by radioactive decay, 210Pb, because of its longer half-life, is removed mainly by precipitation and dry deposition. Its horizontal and vertical distributions are the result of the integrated effects of the distribution and intensity of sources, the large-scale motions of the atmosphere, and the distribution and intensity of removal processes. The inventory of 210Pb in the air is about a thousand times lower than expected, given the amount of its parent 222Rn, a measure of the efficiency with which aerosols are removed from the atmosphere. Numerous measurements of 210Pb as well as other daughter products indicate a tropospheric aerosol residence time of under 1 week. Since the residence time of 210 Pb is so short and the oceans are not a significant source, the isotope sometimes can be used to distinguish between air masses originating over the continents and over the oceans. See AEROSOL; AIR MASS; ATMOSPHERIC CHEMISTRY; RADON. Aquatic systems. An important use of 210Pb is as a particle tracer in aquatic systems. Concentrations of 210Pb in surface waves of the oceans generally show the same latitude variations as seen in air and
rain. Concentrations in surface waters are roughly 20 times less than expected if there were no removal mechanisms other than radioactive decay. On entering ocean waters, 210Pb is incorporated into microscopic marine organisms, particularly zooplankton, whose remains eventually sink, rapidly conveying 210 Pb to underlying waters. Using 210Pb as a tracer has helped explain the mechanisms by which various substances, including pollutants, are removed from the oceans. See SEAWATER. Most of the 210Pb in deep ocean waters is produced from the decay of dissolved 226Ra. The activity of 210Pb is as low as 20% of the activity of radium, indicating the operation of a deep-water scavenging mechanism acting preferentially on 210Pb. Measurements of 210Pb concentrations in deep ocean water indicate a scavenging residence time of around 40 years. The removal process appears to involve horizontal transport of 210Pb to selected areas of intense scavenging by sediments. One of the most important uses of 210Pb is for dating recent coastal marine and lake sediments. As the isotope is rapidly removed from water to underlying deposits, surface sediments often have a considerable excess of 210Pb. The excess is defined as that present in addition to the amount produced by the decay of radium in the sediments. When the sedimentation rate is constant and the sediments are physically undisturbed, the excess 210Pb decreases exponentially with sediment depth as a result of radioactive decay during burial. The reduction in activity at a given depth, compared with that at the surface, provides a measure of the age of the sediments at that depth. Typically, excess 210Pb can be measured for up to about five half-lives or about 100 years, and it is therefore ideally suited for dating sediments that hold records of human impact on the environment. See RADIOISOTOPE (GEOCHEMISTRY); SEDIMENTOLOGY. Alex N. Halliday; John A. Robbins Bibliography. R. E. Criss, Principles of Stable Isotope Distribution, 1999; A. P. Dickin, Radiogenic Isotope Geology, 1997; G. Faure and T. M. Mensing, Isotopes: Principles and Applications, 3d ed., 2004.
Lead metallurgy The extraction of lead from ore, its subsequent purification and processing, and its alloying with other metals to achieve desired properties. Possibly the oldest metal known, lead was particularly attractive to alchemists, predecessors of today’s metallurgists and chemists, as a substance which they believed could be transformed into gold. There are several ores of lead; the most important is galena, PbS. Other ores are cerussite, PbCO3, and anglesite, PbSO4. Lead ores are often accompanied by zinc ores, and galena usually contains profitable quantities of silver and gold. See CERUSSITE. Specifications and grades. There are many types and several specifications for lead. In the United States, these include ASTM B29 and Federal Specification QQ-L-171; in Germany, DIN Standard 1719;
723
724
Lead metallurgy
Chemical requirements for various grades of lead United States ASTM Pig Lead B29
Element Ag, max % Ag, min % Cu, max % Cu, min % Ag ⫹ Cu, max % As, max % Sb, max % Sn, max % As ⫹ Sb ⫹ Sn, max % Zn, max % Fe, max % Bi, max % Cd, max % Co, max % Ni, max % Pb, min % ∗
Smelter lead
Pure copper lead
Chemical type A
0.001 --0.001 -----
0.001 --0.001 -----
0.0025 --0.08 0.04 ---
0.002 --0.003 -----
NS∗ NS NS NS NS
0.001 0.001 0.001
0.001 0.002 0.001
0.001 0.002 0.001
0.001 0.002 0.001
Trace 0.002 Trace
NS NS NS
--0.001 0.001 0.005 ------99.99
--0.001 0.001 0.01 ------99.985
--0.001 0.001 0.05 ------99.94
--0.001 0.001 0.01 ------99.90
--0.002 0.003 0.005 Trace 0.001 0.001 99.99
NS NS NS 0.005 NS NS NS NS
Chemical
Acidcopper
Common desilverized
Pure Pb 99.99
0.0015 --0.0015 --0.0025
0.020 0.002 0.080 0.040 ---
0.002 --0.080 0.040 ---
0.002 --0.0025 -----
0.001 --0.001 -----
-------
-------
-------
-------
0.002 0.001 0.002 0.005 ------99.94
0.002 0.001 0.002 0.025 ------99.90
0.005 0.002 0.002 0.150 ------99.85
Corroding
0.002 0.001 0.002 0.050 ------99.94
United Kingdom Specification BS334
Germany Specification DIN 1719 Pure Pb 99.985
Chemical type B
NS = not specified; open to negotiation between supplier and customer.
and in the United Kingdom, BS 334. The lead refined to meet these specifications is usually primary lead, that is, lead which is extracted and refined from ore. Secondary lead, recycled mostly from old storage batteries, can be refined to meet the same specifications as the primary metal, but the remelting, drossing, and recasting involved may have a negative impact on composition. In the United States most lead ore is smelted and refined to a minimum purity of 99.85%. At and above this level of purity, four different grades of lead are recognized by ASTM B29: corroding, chemical, acidcopper, and common desilverized lead. The major differences among these grades are the allowable concentrations of copper, silver, and bismuth. Even trace amounts of these elements can have a significant effect on the properties or cost of the lead and justify having the four grades. Other elements whose concentrations are generally kept low include antimony, arsenic, tin, zinc, and iron. The chemical composition requirements for the four ASTM grades of lead are given in the table. The illustration shows an abridged flowsheet of lead metallurgy. Concentrating. The first step in the beneficiation of ores to raise the lead content and to separate the lead from the zinc and iron minerals is concentrating. The standard processing begins with crushing the raw ore, followed by wet grinding, which reduces the product to a particle size of 75% minus 325 mesh. The resulting slurry is conditioned with certain reagents to establish a proper alkalinity, and by further mixing with flotation chemicals that collect the lead minerals in a froth, which is thickened and filtrated. Reagents include sodium carbonate, lime, copper sulfate, pine oil, cresylic acid, xanthate, and sodium cyanide, among others, in amounts ranging
from 0.05 to 5.0 lb per ton (0.25 to 2.5 kg per metric ton) of ore. See FLOTATION. During the lead flotation process, zinc minerals, iron compounds, and earth components of the ore are depressed instead of floated by this treatment, and are recovered after further separations. Copper, silver, and gold, if present, normally remain with the lead and are removed and recovered in the refinery. However, copper, if present in sufficient quantity, may be separated from the lead-copper concentrates by special flotation procedures. The lead concentrate produced in this first processing step has a lead metal content of about 70%. Smelting. Before smelting, lead concentrates are frequently blended with high-grade raw ores or returned intermediates (such as flue dusts, limerock, and so forth) which are drawn from proportioning bins. These materials are pelletized so that a homogeneous and carefully sized smelter feed is provided. The feed is then sintered to eliminate most of the sulfur and to agglomerate the particles into relatively large, hard lumps that will not be blown out of a blast furnace. Sinter machines consist of a chain of moving pallets on which the porous feed is ignited. The ore pellets are subjected to an air blast (downdraft or updraft) which burns the sulfur, creating sulfur dioxide, and at the same time oxidizes the metallic elements. The sinter product, with about 9% of its weight of carbon in the form of coke, is charged into the top of a blast furnace. This is a simple vertical shaft that could be as high as 23.3 ft (7.11 m) and with a 5.5-ft (1.68-m) diameter. The coke supplies the fuel for melting the charge, and also the reducing gas which reacts with the lead oxide to form metallic lead. As the charge descends in the furnace, the molten metal flows to the bottom, from where it is
Lead metallurgy mine ores concentrator coke
flux
concentrates
limerock
mixing beds or bins smelter feed air stack
sinter machines sinter
lead bullion
flue dust and gases
drossing kettles
baghouse or cottrell
decopperized bullion
dust
blast furnace slag dump
dross
fuming furnace
dross furnace
barren slag
dust
matte and speiss
zinc plant
gases
(alternate)
impure lead softening furnace
copper smelter antimony slag
chlorine gas
anodes deantimonized bullion
reduction furnace
zinc metal
casting kettle
electrolytic refinery cathodes
antimonial lead
casting kettle Parks process desilverized lead
refined lead skimmings
zinc
slimes
retort chlorine dezincing
zinc chloride
vacuum dezincing
by-products plants
retort metal
dezinced lead
cupel furnace
caustic refining
dore´ metal
refined lead
parting plant gold
dore´
antimony
bismuth
silver
An abridged flowsheet of lead metallurgy.
withdrawn for further treatment. The remainder of the charge forms slag that floats on the lead and is removed from the furnace at a higher level. An improvement is the continuous tapping of slag, which eliminates the high labor requirement for the intermittent operations of the past. The lead bullion contains the silver, gold, copper, bismuth, antimony, arsenic, tin, and other minor metals in the ore; the slag carries the zinc, iron, silica, lime, and other gangue; the dust contains a variety of elements, including cadmium and indium, plus much lead and zinc. A typical blast furnace takes about 750 tons (680 metric tons) of sinter per day and produces some 250 tons (230 metric tons) of lead bullion. See SINTERING.
Refining. The impure bullion is cooled in kettles to about 660◦F (350◦C), causing a dross to form which carries almost all of the copper as well as major amounts of lead. Most of the copper segregates in the kettle at low temperatures because the solubility of copper in lead, just above the latter’s melting point, is very low. After the dross, which has a high copper content, is skimmed off, the rest of the copper is removed by stirring sulfur into the bath. At this stage, the work lead has been decoppered down to 0.01%. If significant quantities of tin are present, the lead bullion may be reheated to 1100◦F (600◦C), with the introduction of air, and a second dross containing the tin is removed. The lead bullion is then sent on to the refinery.
725
726
Lead metallurgy Slag from the blast furnace is frequently treated by fuming out the zinc and remaining lead. The barren slag is discarded. The dusts from the sinter plant and blast furnace are collected in a baghouse or precipitator, and returned to the mixing bins or proportioning bins. Decopperized lead bullion, containing significant amounts of silver, gold, and other materials, is refined again by one of two principal processes. Most of the world’s output is produced by means of pyrometallurgical techniques, while the remainder—less than 20%—is electrolytically refined. The latter process is used only when the bismuth content of the lead bullion is relatively high. The first step in the pyrometallurgical operation is to soften the metal by removing the arsenic, antimony, and tin. This can be done with air oxidation of these elements in a small reverberatory furnace at a temperature of 1300–1380◦F (700–750◦C). The slag, which is normally high in antimony and lead content, is reduced with coke, and the resulting antimonial lead alloy is marketed. Another widely used method of removing these impurities is the Harris process. In this operation, liquid lead bullion is sprayed through molten caustic soda and molten sodium nitrate. The latter reagent oxidizes the arsenic, antimony, and tin, which are converted into sodium salts and skimmed from the bath. The three elements can be recovered from these compounds by wet chemical methods. After the refining process has been carried to this point, the silver and gold are separated, using the Parkes zinc-desilverizing process. This method is based on the principle that when molten lead is saturated with an excess of zinc the precious metals become insoluble in lead. Intermetallic compounds with zinc are formed, and these float to the surface as a solid crust or dross which can be skimmed off. After two or three such treatments, the lead is free of silver and gold and proceeds to the next operation. The desilverized lead is essentially a eutectic melt of lead and zinc, and the latter is now usually extracted by vacuum dezincing, in which the main part of the zinc is evaporated under vacuum at 1100◦F (600◦C) and recovered in metallic form. Some plants dezinc the lead by passing chlorine through the metal, forming zinc chloride. The residue is essentially an impure alloy of silver and gold. After an oxidation treatment in a cupel furnace, the now partly refined alloy, termed dor´e metal, is electrolytically parted into refined silver and gold. Grades of lead with a high bismuth content are unsuitable for some purposes, and so removal of the bismuth is economical for contents above 0.1%. Minor quantities of bismuth are likely when lead is refined by pyrometallurgical means. The bismuth may be eliminated by the Betterton-Kroll process, which involves treatment of the melt with small amounts of calcium and magnesium. These combine with the bismuth and rise to the surface as a dross which is readily skimmed. There are other methods for reducing the bismuth content to as little as 0.001% by precipitation with sodium. Removal of the bismuth
is also possible by electrolytic refining of lead, but economically so only when the bismuth content exceeds 0.5%. Lead produced by the processes described, or by others, can have a purity of 99.90 to 99.99%. At this level the grades of lead differ only by their bismuth content. Smelting techniques make it possible to produce commercial lead of any desired degree of purity, independent of the nature of the raw material. Lead with so-called six nines purity (99.9999) can be provided for special situations or scientific research. See ELECTROLYSIS; PYROMETALLURGY, NONFERROUS. Alloys. If a user requires certain alloys to achieve desired properties, the elements are added later. In general, addition of small amounts of other metals increases the hardness and tensile strength of lead and decreases its ductility, malleability, and density. Often the tendency for lead to creep, its melting point, and the grain size are also reduced. Subsequent additions of the same metal do not usually provide a linear improvement in the mechanical properties. After a certain amount of alloying metal has been added, further additions often create alloys which are weaker and more brittle. Another factor which depends on alloy concentrations is the tendency of certain alloys to improve in time due to age hardening. However, alloys that age-harden are also susceptible to overaging, which turns strong alloys into the original soft leads, with the alloying metal ending up in segregated concentrations. Heating can accelerate both age hardening and overaging. The percentage of alloying metal which will provide a specific amount of improvement in mechanical properties depends in part upon the difference between the atomic radius of lead and that of the alloying metal. It also depends on whether the alloy is heat-treated and whether it is formed by casting, rolling, or extruding. Usually the greater the difference in atomic radii between the lead and the second metal, the greater the improvement in mechanical properties for the same percentage addition. The effect of adding a third or fourth metal to a binary lead alloy can be too complex to predict. However, one indication of final results is the effect of adding the first alloying metal. In binary, ternary, and higher alloys, there are two other factors, besides a diminishing rate of mechanical improvement and increasing brittleness, which keep the optimal alloy concentrations low. These are cost and the loss of corrosion resistance. Silver, for example, improves corrosion resistance of lead in electrolytic processes, but its cost militates against broad use. Lithium, on the other hand, can make lead more vulnerable to corrosion in electrolytic processes. The amount of lithium that can be added to lead, therefore, can be limited by the unfavorable change in corrosion resistance. The alloys of lead most frequently used in applications requiring high corrosion resistance are those which contain antimony, silver, tin, calcium, and copper. The addition of 1–13% antimony to lead creates alloys that have much greater tensile strength,
Leaf resistance to fatigue, and hardness than pure lead. This is why antimonial lead is often called hard lead and the removal of antimony in lead refining is called softening. Most antimonial lead is used for storagebattery components, and contains 3–8% antimony. Alloys with antimony levels of 0.80–1.15% and 6– 8% are used most frequently in nonbattery applications. The low-end alloys are used to produce cable sheathing, while the 6–8% antimony alloys are used to fabricate a wide variety of equipment, such as tank linings, pipe, and anodes for chromium plating. Lead alloys with higher percentages of antimony are used to make castings when hardness is important. However, such alloys are rarely used because they are very brittle and do not have high corrosion resistance. The addition of small amounts of calcium and tin to lead creates alloys which have significantly increased mechanical strength and which age-harden at room temperatures. The corrosion resistance of these alloys also is higher than that of antimony alloys in many applications. Calcium alloys, with or without tin, are used as battery grids, anodes, and roofing materials. Tin alloys of lead have much lower densities and melting points than pure lead. These properties continue to decrease as more tin is added until the eutectic composition (approximately 63% Sn, 37% Pb) is reached. The lower melting point of these lead-tin alloys and increased strength have made them especially important as solders. The alloys used primarily for corrosion resistance, however, are low in tin content, with the most important alloy, terne, usually having a composition of 12–20% Sn. Terne is used to coat steel for use in items like roofing and automotive fuel tanks. See CORROSION; LEAD; LEAD ALLOYS; METAL, MECHANICAL PROPERTIES OF. A. L. Ponikvar Bibliography. ASM Metals Handbook, vol. 11, pp. 493–522, 9th ed., 1979; D. H. Beilstein, The Herculaneum lead smelter of St. Joe Minerals Corporation, Proceedings of the AIME World Symposium on Mining and Metallurgy, Lead and Zinc, St. Louis, 1970; T. R. A. Davey and W. R. Bull, Process research on lead and zinc extraction, Proceedings of the AIME World Symposium on Mining and Metallurgy, Lead and Zinc, St. Louis, 1970; W. Hofman, Lead and Lead Alloys, 1970; M. L. Jaeck (ed.), International Symposium on Primary and Secondary Lead Processing, vol. 15, 1989; Lead for Corrosion Resistant Applications, Lead Industries Association, Inc., 1974; A Metal for the Future, St. Joe Minerals Corp., 1975; D. O. Rausch et al. (eds.), Lead-Zinc Update, 1977.
Morphology A complete dicotyledon leaf consists of three parts: the expanded portion or blade; the petiole which supports the blades; and the leaf base. Stipules are small appendages that arise as outgrowths of the leaf base and are attached at the base of the petiole (Fig. 1). Stipules may be green and bladelike, as in pea; coarse rigid spines, as in black locust; sheaths, as in smart-weed; or more often, temporary structures which function only to protect the leaf while it is folded in the bud. Leaves that have a blade and petiole but no stipules are said to be exstipulate. Some leaves have no apparent petiole and are described as sessile. The leaves of monocotyledons may have a petiole and a blade, or they may be linear in shape without differentiation into these parts; in either case the leaf base usually encircles the stem. The leaves of grasses consist of a linear blade attached to the stem by an encircling sheath. At the junction of the sheath and the blade is a collarlike structure called a ligule. Arrangement (phyllotaxy). Leaves are borne on a stem in a definite fixed order, or phyllotaxy, according to species (Fig. 2). Phyllotaxy is usually helical (spiral), with a single leaf at each node. Leaves may be borne in a compact arrangement as in rosette plants, or if the stem between the nodes (the internode) elongates so that successive leaves are widely separated, the plant is referred to as caulescent. If two leaves are borne on each node, the arrangement is opposite; if successive pairs of opposite leaves are
blade
petiole petiole
stipule (a)
(b)
(c)
blade
ligule
Leaf A lateral appendage which is borne on a plant stem at a node (joint) and which usually has a bud in its axil. In most plants, leaves are flattened in form, although they may be nearly cylindrical with a sheathing base as in onion. Leaves usually contain chlorophyll and are the principal organs in which the important processes of photosynthesis and transpiration occur.
sheath (d)
(f)
sheath
(e)
Fig. 1. Leaf parts in different structural patterns. (a) Complete (stipulate and petiolate). (b) Exstipulate. (c) Sessile. (d) Expanded monocotyledon leaf with sheathing base. (e) Grass leaf. (f) Detail of e.
727
728
Leaf
leaf axil
node internode
(a)
(b)
(c)
(d)
Fig. 2. Leaf arrangement. (a) Helical (top view). (b) Helical with elongated internodes (alternate). (c) Opposite (decussate). (d) Whorled (verticillate).
leaflet
leaflet
(a) (c) petiole (b)
rachis (d)
(e)
(f)
Fig. 3. Leaf types. (a) Simple. (b) Trifoliate. (c) Palmately compound. (d) Odd-pinnately compound. (e) Even-pinnately compound. (f) Decompound.
borne at right angles, it is decussate. When more than two leaves are present at a node, the arrangement is whorled (verticillate). Types. A leaf is simple when it has only one blade. It is compound when the blade is divided into two or more separate parts called leaflets: trifoliolate if it has three leaflets (Fig. 3); palmately compound if leaflets originate from a common point at the end of the petiole; pinnately compound if leaflets are borne on the rachis (continuation of the petiole); odd-pinnate if such a compound leaf is terminated by a leaflet; even-pinnate if it is without a terminal leaflet; decompound (bipinnate) if it is twice compound. Shapes. A leaf may be linear, long and narrow, with the sides parallel or nearly so (Fig. 4); lanceolate, narrow but tapering from base toward apex; oblanceolate, broader at the apex, tapering toward the base; spatulate, broad and obtuse at the apex, tapering to a narrow base; ovate, egg-shaped and broadest toward the base; obovate, the reverse of ovate, broadest toward the apex; elliptic, broadest at the middle and tapering slightly to a broadly rounded base and apex; oblong, somewhat rectangular, with nearly straight sides and a rounded base and apex; deltoid, triangular; reniform, kidney-shaped, broader than it is long; orbicular, circular or nearly so; peltate, shield-shaped, usually a circular leaf with the petiole attached at or near the center of the lower surface; perfoliate, having the stem apparently passing through it; connate, when the bases of two opposite leaves seem to have fused around the stem. Margins. The margin, or edge, of a leaf may be entire (without indentation, or teeth); serrate (with sharp teeth pointing forward), or serrulate (finely serrate); dentate, with coarse teeth pointing outward, or denticulate (finely dentate); crenate or scalloped,
with broad rounded teeth; undulate, with a wavy margin; incised, cut into irregular or jagged teeth or segments (if segments are narrow and pointed, laciniate; if directed backward, runcinate); pinnatifid, deeply pinnately parted (featherlike); or dissected, cut into numerous slender, irregularly branching divisions (Fig. 5). When the blade is deeply cut into fairly large portions, these are called lobes. The degree of such lobing may be designated by the following terms: lobed, with sinuses usually not more than halfway from margin to midrib (midvein) or base, and with lobes and sinuses more or less rounded; cleft, when incisions extend halfway or more from margin to midrib, especially when they are sharp; parted, cut so deeply that the sinuses extend almost to the midrib or base; divided, cut entirely to the midrib, making a leaf compound. Tips and bases. The tip of a leaf may be acuminate, gradually tapering to a sharp point; acute, tapering more abruptly to a sharp point; obtuse, with a blunt or rounded tip; truncate, seeming to be cut off square or nearly so; emarginate, decidedly notched at the
(a) (b)
(i)
(c)
(d)
(e)
(j)
(f)
(k)
(g)
(l)
(m)
(h)
(n)
Fig. 4. Leaf shapes. (a) Linear. (b) Lanceolate. (c) Oblanceolate. (d) Spatulate. (e) Ovate. (f) Obovate. (g) Elliptic. (h) Oblong. (i) Deltoid. (j) Reniform. (k) Orbicular. (l) Peltate. (m) Perfoliate. (n) Connate.
(a)
(b)
(h)
(i)
(c)
(j)
(d)
(e)
(k)
(f)
(l)
(g)
(m)
Fig. 5. Leaf margins of various types. (a) Entire. (b) Serrate. (c) Serrulate. (d) Dentate. (e) Denticulate. (f) Crenate. (g) Undulate. (h) Incised. (i) Pinnatifid. (j) Dissected. (k) Lobed. (l) Cleft. (m) Parted.
Leaf
(a)
(h)
(b)
(i)
(c)
(j)
(d)
(k)
(e)
(l)
(f)
(g)
(m)
(n)
Fig. 6. Leaf tips and bases. (a) Acuminate. (b) Acute. (c) Obtuse. (d) Truncate. (e) Emarginate. (f) Mucronate. (g) Cuspidate. (h) Cuneate. (i) Oblique. (j) Cordate. (k) Auriculate. (l) Sagittate. (m) Hastate. (n) Clasping.
tip but not lobed; mucronate, abruptly tipped with a small short point; or cuspidate, ending in a sharp rigid point (Fig. 6). The base of the blade may be cuneate, wedgeshaped; oblique, the two sides of the base unequal; cordate, heart-shaped, with a conspicuous sinus; auriculate, with a small earlike lobe on either side of the petiole; sagittate, arrow-shaped, with a pair of basal lobes turned outward; or clasping, sessile and partly investing the stem. Venation. The arrangement of the veins, or vascular bundles, of a leaf is called venation (Fig. 7). As seen with the unaided eye, the venation appears in three basic patterns: dichotomous, reticulate (netted), and parallel. In dichotomous venation the veins are forked, with each vein dividing at intervals into smaller veins of approximately equal size; this pattern is characteristic of most ferns, gingko, and some primitive angiosperms. Reticulate venation may be described as a branching system with successively thinner veins diverging as branches from the thicker veins. The main veins may form a pinnate or palmate pattern, while the small veins form a fine network. In parallel-veined leaves, veins of relatively uniform size are oriented longitudinally, or nearly so, depending on the degree of expansion of the blade. The main longitudinal veins are usually interconnected with small veins. Reticulate venation is most common in
veins
veins (c) (d)
(a)
midrib (b)
(e)
Fig. 7. Leaf venation. (a) Dichotomous. (b) Pinnate reticulate. (c) Palmate reticulate. (d) Parallel (expanded leaf). (e) Parallel (linear leaf).
dicotyledons, parallel venation in monocotyledons. See GINKGOALES; LILIOPSIDA; MAGNOLIOPSIDA; POLYPODIALES. Surface. Surfaces of leaves provide many characteristics that are used in identification. A surface is glabrous if it is smooth or free from hairs; glaucous if covered with a whitish, waxy material, or “bloom”; scabrous if rough or harsh to the touch; pubescent, a general term for surfaces that are hairy as opposed to glabrous; puberulent if covered with very fine, downlike hairs; villous if covered with long, soft, shaggy hairs; hirsute if the hairs are short, erect, and stiff; and hispid if they are dense, bristly, and harshly stiff. Texture. The texture may be described as succulent when the leaf is fleshy and juicy; hyaline if it is thin and almost wholly transparent; chartaceous if papery and opaque but thin; scarious if thin and dry, appearing shriveled; and coriaceous if tough, thickish, and leathery. Duration. Leaves may be fugacious, falling nearly as soon as formed; deciduous, falling at the end of the growing season; marcescent, withering at the end of the growing season but not falling until toward spring; or persistent, remaining on the stem for more than one season, the plant thus being evergreen. See DECIDUOUS PLANTS; EVERGREEN PLANTS. Anatomy The foliage leaf is the chief photosynthetic organ of most vascular plants. Although leaves vary greatly in size (from less than 0.04 in. or 1 mm in the duckweeds to over 66 ft or 20 m in some palms) and form (Figs. 1–7), they share the same basic organization of internal tissues and have similar developmental pathways. Like the stem and root, leaves consist of three basic tissue systems: the dermal tissue system, the vascular tissue system, and the ground tissue system. However, unlike stems and roots which usually have radial symmetry, the leaf blade usually shows dorsiventral symmetry, with vascular and other tissues being arranged in a flat plane. Stems and roots have apical meristems and are thus characterized by indeterminate growth; leaves lack apical meristems, and therefore have determinate growth. Because leaves are more or less ephemeral organs and do not function in the structural support of the plant, they usually lack secondary growth and are composed largely of primary tissue only. See APICAL MERISTEM; ROOT (BOTANY); STEM. The internal organization of the leaf is well adapted for its major functions of photosynthesis, gas exchange, and transpiration. The photosynthetic cells, or chlorenchyma tissue, are normally arranged in horizontal layers, which facilitates maximum interception of the Sun’s radiation. The vascular tissues form an extensive network throughout the leaf so that no photosynthetic cell is far from a source of water, and carbohydrates produced by the chlorenchyma cells need travel only a short distance to reach the phloem in order to be transported out of the leaf (Figs. 8 and 9). The epidermal tissue forms a continuous covering over the leaf so that undue
729
730
Leaf
trichome
xylem
cuticle
the ends of the guard cells are fixed, so that with increase in pressure the cells bend away from each other, opening the pore. The guard cells are unique in that they do possess chloroplasts, which provide a ready source of energy for this process. Stomates may occur on both the upper and lower leaf surfaces; or they may be restricted to the lower, shaded epidermis. In some cases such as the floating leaves of the water lily, stomates are present only in the upper epidermis. The leaf epidermis of many vascular plants bears specialized hairlike structures called trichomes. These range from the simple prickle hairs on the leaves of many grasses, to the multicellular glandular hairs which give plants like geranium and tomato their characteristic odor, and to the complex highly branched hairs of plants like the wooly mullein. Except in obvious cases such as the modified trichomes of Venus’ flytrap which secrete digestive
minor vein phloem
upper epidermis palisade mesophyll
spongy mesophyll lower epidermis
cuticle
bundle sheath cell
subsidiary cell
aperture
stomatal guard cell bundle sheath extension
Fig. 8. Three-dimensional diagram of internal structure of a typical dicotyledon leaf.
upper epidermis palisade parenchyma
water loss is reduced, while at the same time the exchange of carbon dioxide and oxygen is controlled. Epidermis. The epidermis is usually made up of flat tabular cells which may be elongate, as in the linear leaves of grasses, or more or less square or lobed in surface view, as in many broad-leaved dicotyledons (Fig. 8 and Fig. 10). Regardless of shape, epidermal cells always fit tightly together without intercellular spaces, and they secrete a layer of hydrophobic substances, such as cutin and waxes, on the outside surface. Both of these adaptations reduce water loss. The layer of cutin is referred to as the cuticle. The epidermal cells usually contain colorless plastids rather than chloroplasts, so that the epidermis tends to be a clear unpigmented layer of cells which allows light to penetrate to the subjacent photosynthetic tissue. The gas exchange required for photosynthesis takes place through thousands of minute stomates which usually occur at densities of about 0.15–0.75 in.2 (100–500/mm2) [Fig. 10 ]. Each stomatal apparatus consists of a pore surrounded by two guard cells and associated subsidiary cells, which are sometimes morphologically differentiated from other epidermal cells. In the vast majority of vascular plants, the stomatal pore is closed at night so that evaporation of water vapor from internal air spaces is reduced, while in the light the guard cells are deformed in such a way that the pore is opened and carbon dioxide may enter the leaf. The ingenious mechanism of stomate opening and closure is as yet incompletely understood, but it appears to involve the pumping of potassium ions from the subsidiary cells to the guard cells, raising the osmotic concentration and increasing the hydrostatic pressure within the guard cells. The circumferential arrangement of cellulose microfibrils in the guard cell wall allows longitudinal expansion only; in addition,
xylem minor vein
(a)
(b)
stomate
phloem spongy parenchyma lower epidermis
100 µm
100 µm
bundle sheath
100 µm
(c) stomate
(d)
minor vein
100 µm
Fig. 9. Leaf cross sections. (a) Beech, Fagus grandifolia. (b) Atriflex rosea. (c) Eucalyptus sp. (d) Corn, Zea mays.
Leaf 50 µm
(a)
(b) guard cell
subsidiary cell bundle sheath xylem (c)
(d)
Fig. 10. Sections of beech (Fagus grandifolia) cut parallel to the surface (paradermal sections). (a) Upper epidermis. (b) Palisade parenchyma. (c) Spongy parenchyma. (d) Lower epidermis. (From N. G. Dengler, L. B. MacKay, and L. M. Gregory, Cell enlargement and tissue differentiation during leaf expansion in beech, Fagus grandifolia, Can. J. Bot., 53:2846–2865, 1975)
enzymes, the actual function of these widespread trichomes has not been demonstrated experimentally. Trichomes are believed to reduce water loss through the cuticle and stomates by creating a boundary layer on the leaf surface. A dense layer of trichomes may also reflect solar radiation from the leaf, protecting internal enzymes from damaging high temperature. In some cases trichomes may prevent insects, snails, and slugs from eating the leaves. See EPIDERMIS (PLANT). Mesophyll. The ground tissue of leaves is referred to as mesophyll and consists of loosely packed parenchyma cells separated by intercellular spaces (Figs. 8 and 9). Within these cells are numerous green chlorophyll-containing chloroplasts, which are the site of photosynthesis. The mesophyll is frequently differentiated into palisade parenchyma and spongy parenchyma. The palisade parenchyma consists of one or more layers of columnar cells elongated at right angles to the epidermal layer, while the spongy parenchyma is made up of highly lobed cells with an extensive network of intercellular air spaces. Often, the largest intercellular spaces in the leaf occur in the mesophyll adjacent to a stomate, and are called substomatal chambers. In leaves with a horizontal orientation, the palisade cells usually lie next to the upper epidermis, while in leaves with a vertical orientation, layers of palisade may be adjacent to both epidermal layers, or as in the leaves of many grasses, there may be no differentiation into palisade and spongy parenchyma (Fig. 9). See PARENCHYMA. Vascular tissue. The major veins of the leaf provide the connection between the vascular bundles of the stem and petiole and the xylem and phloem of the minor veins embedded in the mesophyll of the blade. Each major vein is associated with a rib of parenchymatous tissue, which usually projects from the lower
surface of the leaf and can be seen with the naked eye. Strengthening tissues such as collenchyma and sclerenchyma are often associated with the major vein and provide the chief means of support for the leaf blade. Regardless of the pattern of the major veins (Fig. 6), the distribution of minor veins is such that every photosynthetic mesophyll cell is usually less than 100 micrometers away from the conducting tissues of the minor veins. Xylem occurs on the adaxial side of the minor veins and consists of both parenchyma and tracheary elements, which often have helical secondary walls (Fig. 10). Phloem occurs on the abaxial side of the vein and consists of sieve tube elements and phloem parenchyma. The conducting tissues at the minor veins are enclosed in a continuous sheath of ground tissue, the bundle sheath. The bundle sheath may be sclerenchymatous, but it more often consists of parenchyma cells with poorly developed chloroplasts. In many leaves the minor veins are connected with the epidermis by panels of bundle sheath cells called bundle sheath extensions. In leaves with freely ending veinlets, these extensions pass over the end of the veinlets so that the vascular tissue is not exposed directly to the intercellular spaces. In some plants, guttation (secretion of water) takes place from specialized vein endings and associated tissues called hydathodes. See PHLOEM; XYLEM. Internal structure in relation to transport. Transpiration water leaving the tracheary elements must cross the bundle sheath parenchyma to reach the photosynthetic mesophyll. The bulk of the water is believed to travel either through the apoplast, that is, outside the living protoplast, or through the walls of the intervening cells. The evaporation of water from the cell walls of the mesophyll into the intercellular spaces provides the driving force for the bulk movement of the continuous column of water that extends from the soil through the xylem tracheary elements of root, stem, and leaf to the mesophyll cell walls. It is the cohesive force of this column which holds all of these together. In addition to moving directly from the xylem to the mesophyll, water moves from the veins through the bundle sheath extensions and then laterally along the epidermis. This pathway is particularly important for the palisade parenchyma cells which have limited lateral contact with one another. The bulk of the water lost from the leaf by transpiration evaporates from the surface of the internal tissues of the leaf, often 10 to 30 times the area of external leaf surface. Water vapor escapes from the leaves through stomatal pores and, to a much lesser extent, through the cuticle. See PLANT-WATER RELATIONS. The photosynthetic mesophyll cells act as the source of carbohydrates for the rest of the plant. Sugars are believed to move in a symplastic pathway, a pathway in which materials move from cell to cell through cytoplasmic connections called plasmodesmata. Materials following this pathway do not leave the living protoplast as they move from the mesophyll, across the bundle sheath, and into the sieve tube elements of the phloem. The process of
731
732
Leaf 20 µm
leaf primordium
20 µm
marginal meristem
light intensities and high temperatures. See PHOTOSYNTHESIS. Development
shoot apical meristem procambium
procambium marginal meristem (b)
(a)
xylem of midvein
50 µm
trichomes
plate meristem marginal meristem (c) Fig. 11. Sections of developing leaves of Calycanthus occidentalis. (a) Longitudinal section of leaf primordium. (b) Cross section of leaf primordium. (c) Cross section of young leaf blade. (From N. G. Dengler, Ontogeny of the vegetative and floral apex of Calycanthus occidentalis, Can. J. Bot., 50:1349–1356, 1972)
loading carbohydrates against a concentration gradient into the sieve tube elements requires metabolic energy. The parenchyma cells surrounding the conducting cells often display features of transfer cells— cells with extensive wall invaginations which vastly add to the surface area of the cell membrane. See PLANT TRANSPORT OF SOLUTES. C4 plants. In most monocotyledonous and dicotyledonous species, a three-carbon compound, 3phosphoglyceric acid, is the first product of photosynthesis. These are called C3 plants. However, since the discovery of a new photosynthetic pathway in the 1960s, a growing number of plants have been shown to initially produce four-carbon organic acids as the first products of photosynthesis. These C4 plants are normally characterized by a distinctive type of leaf anatomy in which the bundle sheath parenchyma is made up of conspicuous dark green cells containing a number of large chloroplasts localized at one end of the cell (Fig. 9b and d). In addition, the photosynthetic mesophyll cells are often arranged concentrically around the bundle sheath in C4 plants, giving rise to the designation Kranz (wreath) anatomy for these species. Plants with the C4 pathway are more efficient in their incorporation of carbon dioxide than C3 plants, particularly at high
The foliage leaf, regardless of size and shape at maturity, follows the same basic sequence of developmental events in its formation. Initiation. The leaf arises as an emergence on the flanks of the apical meristem in the shoot tip. Localized cell divisions and accompanying cell enlargement first give rise to a small mound of cells and then to a fingerlike projection called a leaf primordium (Fig. 11a). The new leaf primordium acts as a growth center, drawing nutrients from surrounding tissues and producing growth regulators such as auxin. Microsurgical experiments have shown that the growth center induces the differentiation of a strand of future vascular tissue called procambium which is in continuity with the vascular system of the stem. The procambium continues to grow toward the tip of the primordium as the primordium itself grows in length. The leaf primordium axis later forms the petiole and midrib of the mature foliage leaf, and the first procambial strand becomes the midvein of the leaf. Initially the leaf primordium elongates through concentrated meristematic activity near the apex; eventually, cells at the primordium apex begin to mature, and further growth in length is the result of cell divisions scattered throughout the young leaf. See BUD. Blade formation. The leaf blade is usually initiated when the leaf primordium is less than a half millimeter in length. The blade arises as a ridge of tissue produced by a zone of cell division and cell enlargement on either side of the primordial axis called marginal meristems (Fig. 11b). Analysis of the planes of divisions in the marginal meristems has shown that cell divisions in the surface layer are largely perpendicular to the leaf surface, giving rise to a continuous sheet of cells which will later mature as epidermis. Cell divisions in the internal tissues occur in a variety of planes, adding to the volume of the developing leaf blade, and determining the number of cell layers in the mature leaf. Cell divisions in both the surface and internal layers are most frequent in a region several cells removed from the actual leaf margin. In many plant species, the marginal meristems are active for only a limited period of time, and expansion of the leaf blade is the result of cell division and enlargement throughout the blade between the margin and midrib called plate meristem activity. The plane of cell divisions in the plate meristem is predominantly perpendicular to the leaf surface. This means that the number of cell layers established by the marginal meristem is perpetuated, and that with cell enlargement, the leaf expands in size manyfold in a horizontal plane. At the time that the leaf expands from the bud, between 50 and 90% of its cells have already been formed. Therefore, the final stages of leaf expansion are mostly the result of enlargement of cells that have been formed by divisions in the plate meristem. Characteristic differences in the rate, duration,
Lean manufacturing 50 µm
50 µm
minor vein
(a)
(b) 50 µm
50 µm
(c)
(d)
Fig. 12. Sections of beech (Fagus grandifolia) leaves during expansion from the winter bud. (a) Paradermal section of spongy parenchyma at time of bud swelling. Arrows indicate cell divisions in plate meristem. (b) Cross section of leaf at same stage. (c) Cross section of leaf 1 week later. (d) Cross section of leaf 2 weeks later. (From N. G. Dengler, L. B. MacKay, and L. M. Gregory, Cell enlargement and tissue differentiation during leaf expansion in beech, Fagus grandifolia, Can. J. Bot., 53:2846–2865, 1975)
and direction of cell enlargement occur between the various tissues of the leaf blade. For instance, cells that will become the palisade parenchyma elongate greatly in a vertical direction, and very little in a horizontal direction (Fig. 12); in addition, cell divisions occur for a longer period in the palisade layers. In contrast, cells of the upper epidermis undergo a great deal of horizontal expansion and very little vertical expansion. The intercellular spaces of the palisade parenchyma layers are formed when the enlargement of the upper epidermal cells draws the attached palisade cells apart. Cell expansion in the spongy parenchyma layers does not occur uniformly over the cell surface, resulting in a lobed cell shape and an extensive network of intercellular spaces. The procambial strands that will differentiate as minor veins are formed from localized cell divisions in the plate meristem (Fig. 12). The minor vein pattern is blocked out early during plate meristem activity, and the areoles, regions of mesophyll surrounded by vascular tissue, are well defined throughout the period of leaf expansion. Maturation of xylem and phloem occurs first in the major veins, starting at the base of the leaf and proceeding toward the apex. This is followed by maturation of the xylem and phloem of the minor veins, which proceeds from the apical portion of the leaf blade toward the base. See PLANT GROWTH. Variations in developmental pattern. The process of leaf development in plants with lobed or compound leaves does not differ qualitatively from the mechanisms described above. Rather, differences in leaf shape arise through differences in the location and direction of marginal and plate meristem activity. The
long linear leaves of grasses and some other monocotyledons, however, do possess some unique developmental features. At the earliest stages of development, the leaf primordium is crescent-shaped and encircles the shoot apical meristem. After a short period of apical growth along the rim of the crescent, most of the meristematic activity is localized at the base of the leaf in the intercalary meristem. A majority of cell divisions are in a plane which is perpendicular to the future long axis of the leaf. This means that cells of all tissue layers are formed in long files, with those cells near the apex of the leaf reaching maturity while new cells are being added to the base of each file. Intercalary growth may be prolonged, and may occur sporadically throughout the life of an individual leaf. This unique pattern of growth is often regarded as an adaptation which is associated with the evolution of grazing mammals. Leaf development in rapidly growing herbaceous plants is a continuous process in which leaves produced on the shoot apical meristem expand, carry on photosynthesis, and eventually senesce when the reproductive stage of the life cycle nears an end. In most perennial plants such as the beech illustrated in Figs. 9, 10, and 12, new leaf primordia are formed late in the growing season and overwinter in the apical bud. In the spring the young leaves can quickly complete plate meristem activity and leaf expansion so as to maximize the period of photosynthesis. Leaf abscission occurs at the end of the growing season in deciduous woody plants. The abscission zone may be structurally weak, but the actual separation of leaf from stem is the result of enzymatic degradation of cell walls. A protective layer is produced by the deposition of protective substances such as suberin and wound gum, and the wound is often further protected by the development of a layer of cork. The resulting leaf scar and the bundle scars which mark the position of the abscission of the leaf midvein are often distinguishing features in the identification of species. See ABSCISSION; PERIDERM; PLANT ANATOMY. Nancy G. Dengler
Lean manufacturing A unique linked-cell manufacturing system. Initiated in the 1960s by the Toyota Motor Company, it is also known as the Toyota Production System (TPS), the Just-in-Time/Total Quality Control ( JIT/TQC) system, or World Class Manufacturing (WCM) system. In 1990, it was given a name that would become universal, “lean production.” This term was coined by John Krafcik, an engineer working in the International Motor Vehicle program at MIT with J. P. Womack, D. Roos, and D. T. Jones, who observed that this new system used less of the key resources needed to make goods. What is different about the system is its use of manufacturing cells linked together with a functionally integrated system for inventory and production control. The result is low cost (high efficiency), superior quality, and on-time delivery of unique products from a flexible system. See FLEXIBLE MANUFACTURING
733
Lean manufacturing SYSTEM; INVENTORY CONTROL; MANUFACTURING ENGINEERING.
Mass versus lean production. The key proprietary aspects of lean production are U-shaped manufacturing and assembly cells, using walking workers and are designed with system requirements in mind. Design decoupling allows the separation (decoupling) of processing times for individual machines from the cycle time for the cell as a whole, permitting the lead time to make a batch of parts to be independent of the processing times for individual machines. This takes all the variation out of the supply chain lead times, so scheduling of the supply system can be greatly simplified. The supply chain is controlled by a pull system of production control called Kanban. Using Kanban, the inventory levels can be dropped, which decreases the throughput time for the manufacturing system. See SUPPLY CHAIN MANAGEMENT. Figure 1 shows an entire mass production factory reconfigured into a lean manufacturing factory. The final assembly lines are converted to mixed model final assembly; this levels the demand for subassemblies and other components. The rate of production is determined by recalculating the monthly demand into a daily demand and trying to make the same product mix every day. The subassembly lines are reconfigured into Ushaped cells. The daily output from these cells is balanced to match the demand from final assembly. The traditional job shop generates a variety of
unique products in low numbers with its functional design. It is redesigned into U-shaped manufacturing cells that produce families of component parts (Fig. 2). Lean manufacturers focus on sole-sourcing each component or subassembly (that is, they do not have multiple vendors supplying the same components), sharing their knowledge and experience in linkedcell manufacturing with their vendors on a one-toone basis. For lean automobile manufacturers, the final assembly plant may have only 100 to 400 suppliers, with each supplier becoming a lean or JIT vendor to the company. In the future, the number of vendors supplying a lean manufacturer will decrease even more. The Mercedes-Benz plant in Vance, Alabama, has around 80 suppliers. The subassemblies contain more components as the vendors take on more responsibility for the on-time delivery of a larger portion of the auto. Lean cell design. In a true lean manufacturing system, manufacturing processes and equipment are designed, built, tested, and implemented into the manufacturing cells. The machine tools and processes, the tooling (workholders, cutting tools), and the material-handling devices (decouplers) are designed specifically for cellular manufacturing. Simple, reliable equipment that can be easily maintained should be specified. In general, flexible, dedicated equipment can be built in-house better than it can be purchased and modified for the needs of the cell.
Mass Production
Lean Production
Final assembly
Final assembly Mixed model final assembly
37 36 35 34 33 32 31 30
Inventory
Station #38
Large batches
rin
g
WIP–Subassembly storage St ee
Steering gear subassembly cell
Conveyor lines for subassembly
Out
ink
k-l
In
Rack & pinion subassembly cell
Out k-link
In k-link
734
Parts storage Subassembly cells
Out
Job shop Lathe
Mill
Drill
Receiving Shipping
Assembly
Grind
Raw materials
Rack cell
Components in mfg cells "U" shaped, one-piece flow Vendors U
Fig. 1. Restructuring of a job shop/flow shop final assembly design into a linked-cell manufacturing system.
Lean manufacturing Job Shop
Mfg Cells
Lathe Department L
L
L
L
L
HM
L
L IN
HM M
OUT Mfg cell 1
Milling Department M
M
D
D
G
M
M
D
Drill Department
D
L
M
D
D
D
G
D
Mfg cell 2
D
L G
M
D
G Grinding Department D
G
G Mfg cell 3
Functional Layout of the Job Shop
Linked-Cell
Fig. 2. Job shop portions of the factory require a system level conversion to redesign them into manufacturing cells.
However, many plants that lack the expertise to build machines from scratch do have the expertise to modify equipment to give it unique capabilities, and this interim cell approach can lead a company into true lean manufacturing cells. See MATERIALS-HANDLING EQUIPMENT. Many companies understand that it is not good strategy simply to imitate manufacturing process technology from another company and then expect to make an exceptional product. When process technology is purchased from outside vendors, any unique aspects will be quickly lost. Companies must carry out research and development on manufacturing technologies and systems in order to produce effective and cost-efficient products, but the result makes such investment pay off. There are many advantages to this home-built equipment strategy.
Flexibility (process and tooling adaptable to many types of products). Flexibility requires rapid changeover of jigs,
fixtures, and tooling for existing products and rapid modification for new designs. The processes have excess capacity; they can run faster if necessary, but they are designed for less than full-capacity operation. Building to need. There are three aspects to this. First, there is no unused capability or options. Second, the machine can have unique capabilities that competitors do not have and cannot get access to through equipment vendors. For example, in the lean cell shown in Fig. 3 is a broaching (cutting) machine for producing the gear teeth on a rack bar. The angle that these teeth make with the bar varies for different types of racks. Job shop broaching machines are not acceptable for this cell because of their
735
736
Lean manufacturing STEP 20
STRAIGHTEN MANUALLY
STEP 21
CRACK DETECTION
STEP 19
INDUCTION TEMPER BAR STEP 18
OPERATOR #2
STEP 22
SUPER FINISH RACK TEETH STEP 23
CONDUCTION HARDEN TEETH STEP 17
WASH
STEP 16
BROACH GEAR TEETH
DEBURR & BUFF
STEP15
STEP 24
QUALIFY MILL FOR BROACH
INSTALL PISTON SEAL
STEP 14
STEP 25
CENTERLESS GRIND O.D.
CRIMP PISTON TO BAR
WASH OUT
OUT TO SUBASSEMBLY
OPERATOR #1
STEP 26
STEP 13
STRAIGHTEN MANUALLY STEP 12
STEP 1
INDUCTION HARDEN JOURNAL O.D.
IN
INPUT BLANKS
STEP 11
WASH
STEP 2
GUNDRILL ONE END OF RACK
STEP 10
TURN PISTON GROOVES STEP 9
TAP BOTH ENDS
STEP 3
ROUGH DRILL BOTH ENDS STEP 8
DRILL & CHAMFER PASS HOLES
STEP 4
FINISH REAM, COUNTER, FACE & CHAMFER BOTH ENDS
STEP 7
STEP 5
TRANSFER
TURN O.D. & CHAMFER GUNDRILL END
FINISH MILL & CHAMFER SLOT BOTH ENDS
STEP 6
PART NUMBER PART NAME
ROUGH MILL SLOT BOTH ENDS
12345 BAR RACK
Fig. 3. Lean manufacturing cell for producing rack bars for a rack-and-pinion steering gear.
long changeover times, so a unique machine tool for broaching was designed and built using a proprietary process. Third, the equipment should allow the operator to stand and walk. Equipment should be the appropriate height to allow the operators to easily perform tasks standing up and then move to the next machine in a step or two (that is, the design should have a narrow footprint).
Built-in maintainability/reliability/durability. Equipment should be easy to maintain (to oil, clean, changeover, or replace worn parts, and to use standardized screws). Many of the cells at lean vendors are clones of each other. The vendor company, being the sole source, has the volume and the expertise to get business from many companies, making essentially the same components or subassemblies for many
Lean manufacturing original equipment manufacturers (OEMs). The equipment can be interchanged from one cell to another in emergencies. Machines, handling equipment, and tooling built for the needs of the cell and the system. Machines are typically single-
cycle automatics but may have capacity for process delay. An example of process delay is a heat treatment process that takes 4 min in a cell with a 1 min cycle/time. If the heat treatment machine can hold four units, each getting 4 min of treatment, one unit is still produced per minute. Safety. Equipment is designed to prevent accidents (fail-safe). Equipment designed to be easy to operate, load, and unload.
Research has shown that manufacturing cells have ergonomic benefits over the job shop. Toyota ergonomists recommend unloading with the left hand, loading with the right hand, while walking right to left. The wide variety of tasks over the cell cycle time keep the operators from developing cumulative trauma injuries. Equipment designed to process single units, not batches. Smallarea, low-cost equipment is the best. Machining or processing time (MT) should be modified so that it is less than the cycle time (CT), the time in which one unit must be produced. Equipment processing speed should be set in view of the cycle time, such that MT < CT. The MT is related to the machine parameters selected. For example, in a cutting operation this approach often permits the reduction of the cutting speed, thereby increasing the tool life and reducing downtime for tool changes. This approach also reduces equipment stoppages, lengthens the life of the equipment, and may improve quality. Equipment that contains self-inspection devices (such as sensors and counters) to promote autonomation. Autonomation
is the autonomous control of quantity and quality. Often the machine is equipped to count the number of items produced and the number of defects. See QUALITY CONTROL. Equipment that is movable. Machines are equipped with casters or wheels, flexible pipes, and flexible wiring. There are no fixed conveyor lines. Equipment that is self-cleaning. Equipment disposes of its own chips and trash. Equipment that is profitable at any production volume. Equipment that needs millions of units to be profitable (R. J. Schonberger calls them supermachines) should be avoided, because if the production volume ever exceeds the maximum capacity that the first supermachine can build, it will be necessary to purchase another supermachine and the new one will not be profitable until it approaches full use. Schedulers of the equipment will typically divide the volume into two machines, so neither will be profitable. Lean cell operation. The manufacturing cell shown in Fig. 3 typically uses two operators. These standing, walking workers move from machine to machine in counterclockwise (CCW) loops, each completing the loop in about 1 min. Operator 1 typically addresses 10 stations, and operator 2 addresses 11 stations. Most of the steps involve unloading a machine, loading another part into the machine, checking the
part unloaded, and dropping the part into the decoupler (handling) elements between the machines. The stock-on-hand (SOH) in the decouplers and the machines helps to maintain the smooth flow of the parts through the machines. The decouplers also can be designed to perform inspections for part quality or necessary process delays while the parts heat up, cool down, cure, and so on. The stock-on-hand is kept as low as possible. Sometimes the decoupler elements perform the inspection of the part, but mostly they serve to transport parts from one process to the next. The decoupler may also perform a secondary operation such as deburring or degaussing (removal of residual magnetic fields). By design, one operator controls both the input and output of the cell. One operator always controls the volume of material going through the cell. This keeps the stock-on-hand quantity constant and keeps the cell working in balance with the final or subassembly lines it is feeding. At the interface between the two operators, either one can perform the necessary operations, depending on when they arrive and when the processes in the machines are finished. That is, the region where the two operators meet is not fixed, but changes or shifts depending upon the way parts are moving about the cell. This is called the relay zone. This added flexibility requires that the workers are crosstrained on all the processes in the cell. Ergonomics of lean cells. Ergonomics deals with the mental, physical, and social requirements of the job, and how the job is designed (or modified) to accommodate human limitations. For example, the machines in the cell are designed to a common height to minimize lifting of parts, transfer devices are designed for slide on/slide off, and automatic steps equipped with interrupt signaling help workers monitor the process. When the job is primarily loading/unloading, ergonomic concerns regarding lifting and placing parts in machines and operating workholding devices must be addressed. Lean manufacturing cells are relatively free of cumulative trauma syndrome (CTS) problems because the operators’ tasks and movements are so varied from machine to machine. Cell designers should try to incorporate ergonomic issues initially rather than trying to implement fixes later. Human performance in detecting and correcting cell malfunctions will establish utilization and production efficiency. The design of machines for maintainability and diagnostics is critical. In manufacturing and assembly cells where workers operate machines, it is important that all the machines are ergonomically identical. Sewing machines in a cell are a good example: To the operator, all the machines should feel the same in terms of control. Manufacturing system integration. Many believe that the only way in which manufacturing companies can compete is to automate. This approach, known as computer integrated manufacturing (CIM), was recently renamed agile manufacturing. The concept is to achieve integration through computerization and
737
738
Learning mechanisms
Customer WLK WLK
Warehouse WLK
Assembly line finished goods storage Final assembly line parts storage
WLK
Purchased parts storage
WLK Subassembly finished parts storage Subassembly cell
WLK
Subassembly parts storage WLK
WLK
Parts storage on plant floor
Mfg cells WLK Raw material and purchased parts storage
(Fig. 4), downstream processes dictate upstream production rates. The linked-cell manufacturing system strategy simplifies the manufacturing system, integrates the critical control functions before applying technology (automation, robotization, and computerization), avoids risks, and makes automation easier to implement. This is the strategy that will predominate in the next generation of factories worldwide. See INDUSTRIAL ENGINEERING; PRODUCTION ENGINEERING; PRODUCTION METHODS. J T. Black Bibliography. R. U. Ayres and D. C. Butcher, The flexible factory revisited, Amer. Scientist, 81:448– 459, 1993; J T. Black, Cell design for lean manufacturing, Trans. NAMRI/SME, 23:353–358, 2000; J T. Black and B. J. Schroer, Simulation of an apparel assembly cell with walking workers and decouplers, J. Manuf. Sys., 12(2):170–180, 1993; J T. Black, C. C. Jiang, and G. J. Wiens, Design, analysis and control of manufacturing cells, PED, vol. 53, ASME, 1991; Y. Monden, Toyota Production System, Industrial Engineering and Management Press, IIE, 1983; S. Nakajima, TPM, Introduction to TPM: Total Productive Maintenance, Productivity Press, 1988; R. J. Schonberger, Japanese Manufacturing Techniques: Nine Hidden Lessons in Simplicity, Free Press, 1982; K. Sekine, One-Piece Flow: Cell Design for Transforming the Production Process, Productivity Press, 1990; S. Shingo, A Study of The Toyota Production System, Productivity Press, 1989; N. P. Suh, Design axioms and quality control, Robotics and CIM, 9(4/A):367, August–October 1992; J. P. Womack, D. T. Jones, and D. Roos, The Machine That Changed The World, Harper Perennial, 1991.
WLK Suppliers Fig. 4. Linked-cell design. In the lean manufacturing system, the manufacturing and assembly cells are linked to the final assembly area by Kanban (card) inventory links or loops. WLK indicates a withdrawal card.
automation. This often results in trying to computerize, robotize, or automate very complex manufacturing and assembly processes, and it works only when there is little or no variety in the products. Lean manufacturing systems take a different approach— integrate the manufacturing system, then computerize and automate (IM, C). Experts on CIM now agree that lean manufacturing, especially development of manufacturing and assembly cells, must come before efforts to computerize the system. While costs of these systems are difficult to obtain, the early evidence suggests that the lean cell approach is significantly cheaper than the CIM approach. See AUTOMATION; COMPUTER-INTEGRATED MANUFACTURING. Outlook. The lean factory is based on a different design for the manufacturing system in which the sources of variation in time are minimized and delays in the system are systematically removed. In the linked-cell manufacturing system, in which manufacturing and assembly cells are linked together with a pull system for material and information control
Learning mechanisms Those processes, activities, structural components, and alterations of the nervous system that produce the behavioral manifestations and changes due to experience and event associations (rather than those due to maturation, fatigue, and other developmental or temporary states). Central issues. The search for a neural basis of learning has long been guided by several central issues, as described below. Definition of learning. At the root of any search for the neural mechanism of learning is the problem of defining learning. Most simply, learning may be defined as a change in behavior. However, this definition is certainly too broad since any change in behavior could then be called learning whether it resulted from drug intoxication, sleep, or even death. It is now common to use the term behavioral plasticity for behavior changes that seem to involve at least some aspects of learning and to subdivide the definition into specific types of learning, such as motor learning, sensory learning, and associative learning. In this way, the definition of learning has been narrowed to indicate a change in behavior that is relatively permanent, that is not caused by development, maturation, fatigue, or other short-term processes, and that is due to experience or reinforcement. Hence,
Learning mechanisms what is known as sensitization, a process in which a given behavior increases in intensity simply with repeated occurrences, is an example of behavioral plasticity, whereas learning to play basketball involves the more specific categories of both motor and sensory learning. Several types of learning have been recognized, perhaps the simplest of which is associative learning, that is, learning by association of one stimulus with another through the repeated simultaneous occurrence of the two stimuli. This type of learning is typified by the dog that learns to associate the sound made when a package of dog meal is opened with the food being placed in its dish. This is an example of Pavlovian conditioning. Motor learning, on the other hand, occurs when an animal or human learns to perform some motor task in response to a given event or stimulus. Sensory learning can be thought of as that learning in which an organism is trained to respond to changes in some stimulus or to differences between stimuli. Included here are such familiar things as learning faces or tunes, learning to distinguish two shades of color, and connecting voices with faces. The last is an example of cross-modal generalization learning in which a stimulus in one sensory modality (the auditory) recalls the image of the face in another (the visual). In the human, there are other categories of learning, such as cognitive learning in which the human learns without any obvious external stimuli, but simply through the internal activity of the brain, commonly referred to as consciousness. This is the most difficult type of learning to investigate, since there is little control over the process. See COGNITION; CONDITIONED REFLEX. It is important to recognize that in the search for neural substrates of learning, the type of learning must be specified, since it is evident that the term learning can apply to many different types of events in the brain. Even though all of these events may have at least some common neural substrates, they involve different areas of the brain—and probably different neural mechanisms as well. The problems encountered when defining learning have become stumbling blocks in determining which neural processes are to be studied, because there is little consensus about what is to be measured. Relationship between learning and memory. The question of how learning and memory differ is important in designing studies of the underlying processes of learning. Learning is really a hypothetical construct that is presumed to take place when certain conditions are met, and it is the basis for subsequent changes in the behavior of an organism in response to its environment. Memory, on the other hand, is the storage of those alterations for later use. Without memory, there would be no way to determine if learning had occurred since whatever had been learned would be immediately forgotten. Memory, then, is always needed to assess learning and cannot be completely separated from it. In the same way, any assessment of processes underlying learning will also assess memory, although the two can be analyzed somewhat independently by appropriate experimental designs.
The term engram has long been applied to the memory trace of learning. This term denotes that special alteration which occurs as a consequence of learning and which is then translated into altered behavioral output at a later time. Thus, the engram is a stored representation of the learning experience and is presumably open to investigation. See MEMORY. Physiological process. Since the behavior of all animals above the level of the simplest multicellular organisms is the product of a nervous system, the processes underlying those behavioral changes referred to as learning must be the result of some activity or change in the nervous system. The complex human nervous system includes both the central nervous system (the brain and spinal cord) and the peripheral nervous system (the nerves of the body and the autonomic ganglia). It is generally assumed that the changes important in learning occur in the central nervous system, with the peripheral nerves serving a transport function. Within the central nervous system, changes related to learning (the engram) could be limited to one specific region or be dispersed throughout the entire brain and spinal cord. In addition, the learning substrate could be at a subcellular level, within or between individual cells, or among groups of cells at a regional level. See NERVOUS SYSTEM (VERTEBRATE). Only those processes that are known to occur in the brain can be considered candidates for the substrate of learning. For instance, much is known about the mechanics of blood flow and its effects on brain function, but they are unlikely to serve as the basis of learning. However, electrical events, alterations of neural structure, biochemical changes in or around neurons, changes in synapses, and alternations of neurons over large brain regions all seem to be candidates. Generalizability. Initial observations of the brain’s control of learning occurred mainly in humans with various types of brain damage. However, with the advent of new techniques for the study of brain processes, the study of brain function in infrahuman species has become necessary, since many of the techniques are invasive and destructive. Indeed, most current knowledge of brain processes and neural function has come from animal studies, and may or may not apply to humans. When the study subjects are mammals, any processes found there probably will apply to human brain function. However, in studies on animals such as Aplysia, an invertebrate sea snail, the parallelism may be more difficult to establish. Such arguments make the assumption that the only goal is to investigate how humans function, and while this is certainly the ultimate goal, it is not the only one. Identifying such processes in these organisms allows for better care of these animals both in captivity and in the wild. In addition, since it is considerably easier to study neural process in simple animals, these studies provide clues about processes that may be important in humans. Obviously, brain processes in humans and other animals differ, but studies of simpler organisms offer clues and information about human learning processes.
739
740
Learning mechanisms Research methods. Attempts to find the neural bases of learning are ultimately aimed at understanding the function of the human central nervous system. However, because many of the techniques used in investigations are inherently destructive to some degree, moral and ethical considerations preclude their use in humans. Observation and deduction. One of the oldest and still one of the most productive techniques for the early stages of any study is the method of observation and deduction. The subject is put in a learning situation and observed for changes in behavior. Deductions can then be made about the events that take place from what is known about the function of the nervous system. I. Pavlov used that method to develop one of the most widely used learning techniques, classical conditioning, although many of Pavlov’s theories about the brain substrates of learning have been shown to be incorrect. Observation of brain-damaged humans has provided clues about brain areas that are involved in certain learning functions. While this method provides no information about the neural processes underlying various kinds of learning, the damage resulting from illness, closed head and penetrating wounds, and necessary surgical procedures have provided data on which areas of the brain, and especially the cortex, subserve many of the higher human functions. Obviously, it is ethically and morally impossible to undertake controlled lesion or ablation studies on humans, but the uncontrolled brain lesions seen in clinical medicine have been very helpful in guiding controlled studies in animals. With the advent of noninvasive scanning techniques such as positron emission tomography (PET) and computerized tomography (CT) and magnetic resonance imaging (MRI), it is now possible to provide detailed images of the human brain that reveal damaged areas in some detail. See MEDICAL IMAGING. Ablation. The ablation technique involves removal of a part of the brain and observation of the effects after recovery of the subject. The underlying assumption is that removing some part of the nervous system will cause a deficit that accurately reflects the function of the lost part. This technique has been utilized successfully in brain mapping to localize various sensory and motor functions, but localization is only a partial step in characterizing the brain areas responsible for or involved in learning. Since one brain area may be involved in more than one task, the finding that a cortical area is involved in vision does not preclude its active involvement in other tasks. However, the ablation method assumes that an area is primarily involved in one task and that the function is indeed localized there. Also, when a central brain structure is removed, any subsequent impairment in learning could be due to a loss of sensory or motor ability, a loss of communication links between areas involved in learning, or the destruction of the vital learning area. The simplest ablations involve removing a brain area. Tissue may also be destroyed by passing an electric current through a needle into the tissue, or by
suction, radiation, chemicals, freezing, and stopping blood flow to the region. Reversible ablations, which render the brain tissue temporarily inactive, can be accomplished by using chemicals (such as potassium chloride) or a cold probe. Following ablation, the animal is allowed to recover and then is given a learning task. The results are compared with those from animals given the same task with no ablation or a different ablation. Performance deficits in the ablated animals are attributed to the loss of the lesioned area. Alternatively, the task may be learned first, the ablation performed, and then the animal tested for loss of learning. Such a design is aimed at localizing the engram or area where the effects of learning are stored, assuming that the storage and processing areas are the same. Stimulation. The excitation of neural tissue with electric currents or chemical agents is often thought to be the opposite of ablation since one stops activity and the other starts it. However, stimulation rarely produces the opposite results from ablation and rarely has a positive effect on learning. Stimulation of an area can produce some activity that disappears if the area is ablated. The usual result of stimulation is to disrupt or slow learning, so that it can be used as a temporary lesion, to locate sites that lead to disruption of normal learning patterns when stimulated. Electrical recording. The brain communicates through electrical signals generated by its approximately 20 billion nerve cells. Each cell can generate an electrical impulse and transmit that small electrical charge to other cells, often over substantial distances—3 ft (1 m) in the case of motor neurons that go from the spinal cord to the muscles of the feet in humans. The constant activity of many brain neurons can be recorded by electrodes either as singlecell events, as the activity of small groups of cells, or as the summed activity of millions of cells together. With large electrodes, the activity of cell populations can be recorded directly from the scalp (Fig. 1). See ELECTROENCEPHALOGRAPHY. With the development of sensitive amplifiers, single- and multiple-cell recordings became practical. With these techniques, it is possible to implant electrodes in almost any brain area and to record activity from cells for days and often weeks with no discomfort or damage to the animal. To position the electrodes precisely, a stereotaxic instrument (Fig. 2) is used to hold the anesthetized animal’s head securely and allow the electrode to be lowered into the brain to a precise location. Thus, animals with implanted electrodes can learn tasks while the activity in a brain area is monitored to see if the area participates in the learning. Cellular activity changes that seem to correlate with learning may be observed, but these may be unrelated to the learning of the task itself. Developmental methods. Increasingly, learning is tested at various stages of human or animal development and the learning possible at that developmental stage is correlated with brain tissue function at the time. As with observational methods, this method gives no
Learning mechanisms insight into the neural processes underlying learning, but it can give insight into the areas of the brain that subserve various forms of learning. For example, human fetuses late in the last 3 months before birth can learn movements in a Pavlovian motor learning situation through tactile stimuli administered on the mother’s stomach over the uterus. Since the fetus can associate two such stimuli, this suggests that such motor learning may not require the active role of the cerebral cortex, which is not developed at that time. Likewise, studying the young of the sea snail Aplysia, which can learn some motor tasks, is giving investigators some insight into the parts of the nervous system underlying certain portions of the learning process. Other methods. Increasingly powerful techniques have been under development for investigating the changes underlying learning. The injection of sensitive dyes and radioactive tracers allows delineation of changes in nerve tracts or chemical alterations. Computer analysis of neural activity or activation of neurons allows the detection of processes underlying the engram formation. Powerful microcomputers can record, stimulate, and even perform ablation studies that were once impossible or too time-consuming. In addition, computers can provide modeling of brain function. However, many of these methods provide clues about which brain areas may be involved in learning rather than the actual process of learning. Model approaches. Investigators of the neural substrates of learning now use the strategy of operationally defining learning. This means that learning is defined by the operations used to produce it in the experimental situation. Thus, simple associative or motor learning can be defined by the operations of classical conditioning, that is, the presentation of a conditioned stimulus and an unconditioned stimulus in a specified order and time sequence. Because learning is the development of a unique response to two repeated stimuli that are presented at the same time, the behavioral change to be studied can be specified and reproduced precisely. The use of a simple learning situation is an example of a model approach to the search for neural mechanisms of learning. The investigator can specify the learning task and simplify the complexity of the behavioral changes as well as the neural system to be studied. The human nervous system would be considered the total preparation, with human learning as the total learning system. A nonhuman species for study of a simple learning task and for analysis of the brain processes involved would be a model preparation using a model system. One of the simplest models for studying the associative learning of Pavlovian conditioning is an invertebrate, such as Aplysia. Aplysia has no central brain but rather a system of six pairs of ganglia that control its body functions and are interconnected by nerve pathways to allow coordinated activities. In addition, many of the nerve cell bodies are very large (almost visible to the naked eye) and can be identified in each animal. Aplysia can learn to associate a light
741
single-unit recording
250 ms evoked potential
200 µV
amplifier
500 ms electroencephalogram
7s
5 µV
100 µV
Fig. 1. Brain activity can be recorded in various ways, from single-cell activity (top tracing) to the familiar “brain waves” or electroencephalogram (bottom tracing). (After C. Cotman and J. McGaugh, Behavioral Neuroscience, Academic Press, 1980)
touch to the siphon region with an electrical shock to the tail. The shock causes the animal to quickly and fully withdraw the siphon, whereas the touch does not. Following several paired presentations of the touch (conditioned stimulus) with the shock (unconditioned stimulus), withdrawal to the touch is greatly enhanced. Because that same response does
Fig. 2. Stereotaxic instrument designed to allow precise placement of electrodes into brain structures. Note the four electrode carriers positioned above the area where the animal’s head would be placed. (Kopf Instrument Co.)
742
Learning mechanisms not take place in animals which receive the same stimuli in other combinations, Aplysia can be said to have learned the association. The neural changes underlying its behavior have been located in the abdominal ganglia and seem to involve the activation and alteration of facilitatory neurons that lie between the input sensory neurons and the output motor neurons, and a change in the output of the neurotransmitter from one neuron apparently activates the next neuron in the chain. Such findings cannot automatically be applied to vertebrates and mammals. Mammals have vastly more complex nervous systems and may utilize different neural mechanisms to achieve learning. An understanding of invertebrate learning, however, can offer clues for understanding human learning. See NERVOUS SYSTEM (INVERTEBRATE). Mammalian preparations. To understand learning in the mammalian nervous system, the site at which learning occurs within that system must be found. The simplest mammalian system to be studied is the isolated spinal cord, which is capable of many reflex activities even when separated from the brain. The isolated spinal cord of the cat can alter behaviors when two stimuli are presented to selected sensory inputs in the same way as classical conditioning in the intact animal. Thus, the spinal reflex pathways apparently have the ability to “learn.” The neural mechanisms of the change are obscure, but they apparently take place in the small interneurons that connect the sensory inputs with the motor outputs of the spinal cord, and may be alterations in the synaptic junctions between the interneurons. See REFLEX. Classical conditioning has been shown to occur not only in the isolated spinal cord but in cats and rabbits without a cerebrum or cortex as well as in human infants born without a cerebral cortex. The ability of the mammalian nervous system to associate two meaningful stimuli and to change behavior following the association, therefore, appears to be a primitive neural attribute that is probably a basic part of all learning, including human verbal learning. Thus, associative learning appears to occur at various levels of the mammalian nervous system, and the various levels can support different complexities of learning. The total nervous system would, therefore, be necessary for the complexity of human learning. However, the question of the primary location of various learning processes remains. frontal lobe central sulcus parietal lobe
lateral fissure temporal occipital lobe lobe
motor motor somesthetic cortex somesthetic cortex area area auditory receptive center
somatic area II
visual cortex lateral aspect
olfactory center
visual cortex medial aspect
Fig. 3. Diagram of the cortical areas of the human brain. (After S. W. Ranson and S. L. Clark, Anatomy of the Nervous System, 10th ed., Saunders, 1959)
Perhaps the most intensive investigation of simple learning in an intact animal has utilized the eye blink response in rabbits. This model system confines the neural system to a specific sensory and motor segment of the very complex rabbit central nervous system. Since the same task is learned in the same way by humans, the learning task can be generalized to humans. By using Pavlovian conditioning, the rabbits learn to blink to a tone sounded just prior to an air puff to the eye. The hippocampus seems to produce a pattern of electrical activity that could be responsible for the learned behavior; unfortunately, the animal can learn the task equally well if the hippocampus is removed. Only when certain complicating factors are added to the simple learning task is the hippocampus found to be necessary for normal learning. A small area in the cerebellum seems to be necessary for learning the eye blink in the rabbit. When that area is removed, the learned behavior is lost and cannot be regained. The cerebellum has long been considered a strictly motor area involved with coordination, but the learning seen in the eye blink response occurs in the interconnections of two types of neurons in the cerebellum and may involve the alteration of connections between these neurons. Thus, the primary association for learning the eye blink response occurs in the cerebellum. These studies have begun to shed light on the actual mechanisms involved in learning and storage of that learning but have also demonstrated another problem involved in such searches—locating the site of the primary change. In more complex learning tasks, such as sensory learning, there are various levels of involvement of the higher brain structures. For example, following complete removal of the visual cortex, which subserves vision, a cat or monkey can learn to distinguish lights of different intensities and even speed of movements of visual images. Humans with damage to those visual areas of the cortex can still react to lights in the damaged area of the visual field and can even learn some tasks based on light in that area of the visual field. Some information in the visual sense is, therefore, being processed in other areas of the brain. If the cortex is not completely destroyed, some abilities to learn such things as discrimination will remain but the learning may be impaired, and often learning that had occurred prior to the lesion is lost and must be relearned. When essentially all of the cortex subserving a sense is destroyed, only rudimentary learning can occur, suggesting that some processing of the learning is done at other sites in the brain. Figure 3 shows some of the important cortical brain areas for sensory information processing and learning in the human. Observations of humans who have suffered brain damage or who have had brain areas removed surgically have provided information on the primary sites and mechanisms involved in mammalian learning. The most famous, “H. M.,” had his hippocampus removed at an early age to control intractable epilepsy. H. M. could remember clearly everything
Learning mechanisms that happened prior to surgery and had no impairment of intelligence, but after surgery he could no longer remember new facts, demonstrating that the human ability to transfer factual information into long-term memory resides in the hippocampus. However, H. M. could learn certain things. He could learn the rules of a game while he was playing it, but the next day, when asked if he remembered the game he denied ever having seen it. When asked to play, however, he played the game like anyone else who had learned it while still denying ever having seen it. Thus, the learning of the task for verbal or conscious process had not occurred, whereas the learning of motor patterns for playing the game had occurred and been stored. Thus, in humans the learning of a complex task such as a game is divided into at least two parts, one conscious and the other motor. Similar reports of human learning deficits have come from other brain-damaged humans who cannot remember learning anything following the damage but can relearn some sensory or motor tasks. The learning mechanisms are therefore intact for simpler forms of learning, but the results of that learning cannot be brought to consciousness. In addition, evidence suggests that other learning systems in the human are similarly separate. In one case, a braindamaged accident victim could recall facts about the world and could learn new ones, but he could not remember anything about himself as a participant, even though he remembered who he was. This suggests that in the human, certain brain systems (probably in the frontal lobes of the human brain) may have evolved to support the complex learning of selfawareness and language, which is characteristic of humans. Theories. There have been several theories formed about the actual mechanisms underlying learning process. These theories have often followed current technology, and all have been simplistic. Electrical activity. The earliest theories proposed that learning was an electrical circuit set up in some localized area of the brain that remained active, thus explaining forgetting and other aspects of memory. However, studies of cooling the brain to a point at which there was no electrical activity or giving electrical shocks to the brain that disrupted any ongoing activity showed that the memory of learned tasks survived such treatment. Following this, it was proposed that the initial form of learning was stored as an electrical event that was transformed over several minutes into a more permanent biochemical change in one or more brain neurons. Such notions were modeled after the growing complexity of the telephone system in the 1930s and 1940s. Learning was likened to a switchboard, with changes in individual neurons resulting from the learning activity and producing functional switchings and memories that could be recalled by the appropriate stimuli. Such “one neuron–one memory” theories had some appeal, but they failed to explain how many neurons could be lost from the brain with no apparent loss of memory for learned tasks. In addition, it became increasingly difficult to
account for complex learning with the activity of one neuron. Learning molecules. The deoxyribonucleic acid (DNA) hypotheses of the 1950s and 1960s popularized the theory of learning molecules. If learning was a change in some chemical in the brain or individual neuron, a stable chemical had to be found to account for the stability of the engram. The only sufficiently stable chemical was DNA. Further support for the DNA basis for learning came from studies purporting to show that Planaria flatworms that were fed other Planaria that had learned a simple task seemed to learn that same task by ingesting the trained flat worms. The hypothesis of learning molecules gained popularity because it suggested the feasibility of learning by pill. However, the effects of such learning molecules, if any, have been shown to be a general one due to enriched diet, and not a specific effect of transferring molecules. Growth theories. Evidence for a similar theory has come from studies of animals raised in different laboratory environments and given various tasks to learn. Under appropriate conditions, animals in “enriched” environments, that is, with many colors, toys, and textures, have heavier brains than similar animals raised in “poor” environments. Similarly, animals given complex motor tasks to learn may develop larger brain areas associated with motor and learning function. In these studies, however, it is difficult to separate the effects of motor activity from learning itself. The growth theory of learning states that neurons involved in learning grow larger or have more neural processes when compared with noninvolved neurons. How that would occur is not certain, but using the analogy of a muscle, a neuron would enlarge with exercise and shrink with disuse. Computer analogies. Several theories of learning have drawn analogies between the brain and computer function. These theories postulate that the brain has a central processing unit, or learning center, and that memories are stored in another location to be called up at will. In fact, the human brain does not function like a computer but rather has a vast ability to grasp general concepts and rather limited processing ability. Activity patterns. A global theory of learning process, drawn from several earlier theories, is based on a probabilistic hypothesis in which an incoming stimulus sets up generalized patterns of activity in some brain area that is capable of causing some specific pattern of motor activity. If the incoming stimulus or a similar stimulus recurs, the same pattern of activity occurs over the same brain area and the learned behavior is executed. The theory depends on the summed performance of millions of neurons. While such a theory is elegant and accounts for many features of learning and memory, it does not specify how a stimulus would create such an activity pattern, what changes would allow that pattern to be stored or recreated by other stimuli, or how the patterns would be translated into motor output. For complex learning, the concept of neural activity pattern changes seems to best account
743
744
Least-action principle for human learning. See PROBLEM SOLVING (PSYCHOLOGY). Michael M. Patterson Bibliography. F. E. Bloom and A. Lazerson, Brain, Mind and Behavior, 3d ed., 2000; P. M. Groves and G. V. Rebec, Introduction to Biological Psychology, 4th ed., 1992; B. Kolb and I. Q. Whishaw, Fundamentals of Human Neuropsychology, 3d ed., 1989; C. F. Levinthol, Introduction to Physiological Psychology, 4th ed., 1995; R. Ornstein, L. Carstensen, and S. Patnoe, Psychology, the Study of Human Experience, 3d ed., 1990; E. Tulving, Remembering and knowing the past, Amer. Sci., 77:361–367, JulyAugust 1989.
Least-action principle Like Hamilton’s principle, the principle of least action is a variational statement that forms a basis from which the equations of motion of a classical dynamical system may be deduced. Consider a mechanical system described by coordinates q1, . . . , qf and their canonically conjugate momenta p1, . . . , pf. The action S associated with a segment of the trajectory of the system is defined by Eq. (1), where the inte p jdq j (1) S= c
=
∂U ∂q j
δq j = −
∂U ∂p j
(5)
Writing U = −Hδt leads to Hamilton’s equations of motion, Eqs. (6). The quantity H(q, p), known as ˙j = − p
∂H ∂q j
˙j = q
∂H ∂p j
(6)
the hamiltonian function, does not contain the time explicitly because U(q, p) cannot be a function of the time since the end times are not fixed and in general will vary as the path is varied. Thus, the principle is useful only for conservative systems, where H is constant. If H(q, p) consists of a part H2 quadratic in the momenta and a part H0 independent of the momenta, then Eq. (7) holds by Euler’s theorem on homoge t2 ˙ j dt S= pj q t1
=
j
t2
j
=2
pj
∂H dt ∂ pj
t2
H2 dt
(7)
t1
neous functions. Usually H2 is the kinetic energy of the system so that the principle of least action may be written as Eq. (8), where V is the potential energy.
t2
t1
2T dt =
t2
2(E − V ) dt = 0
(8)
t1
The principle of least action derives much importance from the fact that it is the action which is quantized in the quantum form of the theory. Planck’s constant is the quantum of action. See NONRELATIVISTIC QUANTUM THEORY. Philip M. Stehle
j
j
(δ p j dq j − δq j d p j) c
δp j =
t1
ically conjugate to qj + δqj. Neglecting second order term in Eq. (2), one obtains Eq. (3), where an inte (p j d δq j + δ p j dq j) S = c
j
j
gral is evaluated along the given segment c of the trajectory. The action is of interest only when the total energy E is conserved. The principle of least action states that the trajectory of the system is that path which makes the value of S stationary relative to nearby paths between the same configurations and for which the energy has the same constant value. The principle is misnamed, as only the stationary property is required. It is a minimum principle for sufficiently short but finite segments of the trajectory. See HAMILTON’S EQUATIONS OF MOTION; HAMILTON’S PRINCIPLE; MINIMAL PRINCIPLES. Assume that Eq. (2) holds, where pj +δpj is canon (p j + δp j)d(q j + δq j) (2) S + S = c
being canonically conjugate, as in Eq. (4), with terms defined by Eqs. (5). (δp j dq j − δq j dp j) = dU(q, p) (4)
(3)
j
gration by parts has been made, the integrated parts vanishing. The vanishing of S requires the integrand to be a perfect differential of a quantity whose end variations vanish. The coefficients of the variations δqj,δpj need not vanish separately because the variations are not independent, the varied q’s and p’s necessarily
Least-squares method A method, developed originally by A. M. Legendre, of obtaining the best values (the ones with least error) of unknown quantities supposed to satisfy a system of linear equations of the form shown as notation (1), M11 a1 + M12 a2 + · · · + M1m am = b1 M21 a1 + M22 a2 + · · · + M2m am = b2 · · · · · · · · · · · · · · · · · · ···············
(1)
Mn1 a1 + Mn2 a2 + · · · + Mnm am = bn where n > m. Since there are more equations than unknowns, the system is said to be overdetermined. Furthermore, the values obtained for the unknowns by solving a given selection, m in number, of the
Leather and fur processing equations will differ from the values obtained by solving another selection of equations. In the physical situation, the bi are measured quantities, the Mij are known (or assumed) quantities, and the ai are to be adjusted to their best values. Consider a simple example. A quantity y of interest is supposed (perhaps for theoretical reasons) to be a linear function of an independent variable x. For a series of selected values x1, x2, . . . of x the values y1, y2, . . . of y are measured. The expected relation is shown as notation (2), and the problem is to find the x1 α + β = y1 x2 α + β = y2
about the measurements and their weights. In applications in the physical sciences, however, it is rarely possible to show that one’s observations satisfy all, or even any, of the assumptions. However, the conditions may be approximately satisfied in many instances, and the method is widely used because of its convenience. The empirical result is that the unknowns so determined lead to excellent representations of the data in the usual case. See CURVE FITTING. McAllister H. Hull, Jr. Bibliography. R. W. Farebrother, Least Squares Computations, 1988; J. W. Longley, Least Squares Computations, 1984.
(2)
x3 α + β = y3 ·········
Leather and fur processing
best values of α and β that is, respectively, the slope and intercept of the line which graphically represents the function. The best values of α and β, in the least squares sense, are obtained by writing Eq. (3) and asserting that term (4) shall be minimized with
The technology of processing animal hides and skins. While both leather and fur are derived from animal skins, there are essential differences in the processing techniques.
ηi = yi − (xi α + β)
Leather processing provides a means of tailoring the unique fibrous architecture of animal skins into materials that possess those attributes for which finished leather is prized: a certain texture (feel) and appearance as well as characteristic physical and chemical properties. The raw materials, primarily skins of cattle, goats, hogs, and sheep, are actually by-products of the many worldwide meat production industries. Articles manufactured from leather include footwear, clothing, gloves, luggage, upholstery, harnesses and saddlery, and mechanical devices. Raw stock preservation. Since skins and hides putrefy rapidly after slaughter, some immediate treatment is usually necessary. While some fresh skins and hides may undergo the tannage conversion processes immediately, common salt (NaCl) is usually employed as a temporary preservative, either as a dry salt pack (Fig. 1) or as a saturated brine solution in an oval circulated vat (raceway) or in rotary vessels. In the United States, hides from cattle are most frequently treated by the agitated brine curing process.
n
(3)
ηi2
(4)
i=1
respect to α and β, that is, Eqs. (5) hold. This leads to Eqs. (6) and (7), which may be solved for α and n ∂ η2 = 0 ∂α i=1 1 n ∂ η2 = 0 ∂β i=1 i n
α
xi + nβ −
i=1
α
n
n
(5)
yi = 0
(6)
i=1
xi 2 + β
n
i=1
xi −
i=1
n
xi yi = 0
(7)
i=1
β. For m, rather than two, unknowns the generalization is obvious in principle, although the labor of solution may be great if m is large unless a high-speed electronic computer is available. It should be noted that the measurements yi in the example have all been assumed to be equally good. If it is known that the measurements are of variable quality, a weight may be attached to each value of yi. The least squares equations are readily modified to take this into account, as in Eqs. (8), where wi is the α α
n i=1
n
wi xi + β
i=1
wi xi + β 2
n i=1
n
wi −
i=1
wi xi −
n
w i yi = 0
i=1
n
Leather
(8)
wi xi yi = 0
i=1
weight of measurement yi. The least squares equations can be shown to lead to the most probable (in the statistical sense) values of the unknowns under a variety of assumptions
Fig. 1. Cattlehides being cured with salt in a pack containing about 800 hides.
745
746
Leather and fur processing After the hide is removed from the carcass, it is washed in cool water to lower its temperature and to remove blood and dirt. The flesh that remains on the inner side of the hide along with any manure and other debris is often removed by machine, either before or after brine curing. The brine in the raceway is maintained as close as possible to the saturation point (preferably above 90% saturation) throughout the curing process, which extends over approximately 24 h. When properly cured, the brined hide will have a water content ranging from 40 to 48%, and the retained water will contain NaCl in excess of the 85% saturation level. Since salt-tolerant (halophilic) bacteria may contaminate recycled brines, a bactericide is added to the brine, particularly when ambient temperatures are relatively high. Hides that have been cured with either dry salt or brine may be stored for several months, especially in refrigerated storage. This temporary stability facilitates distribution of the cured hides. The epidermal tissue in the grain area of a typical hide (Fig. 2) includes the outer epidermal layers and the associated hair follicles, oil glands, sweat glands, and erector pili muscles. Collagen, a unique triplehelical biopolymer, is the leather-making hide substance; it predominates in the hide corium and is a major proteinaceous component in the grain enamel and fibrous grain layer that surrounds the epidermal tissues. Grain enamel is the extremely thin outer surface cellular layer of a skin or hide; this layer has a unique structural organization and collagen composition. The unique three-dimensional organization of the biopolymers is responsible for the strength, toughness, and flexibility of the final leather. See COLLAGEN; SKIN. The finest leather is produced from skins that possess the proper grain enamel and hair or wool follicle patterns that permit the production of a unique surface appearance in the final product. Skins or hides whose grain quality is inadequate may be finished sebaceous sudoriferous (oil) gland (sweat) gland
erector pili muscle epidermis
hair
grain corium
1 mm
flesh
Fig. 2. Cross section through cattlehide. 1 mm = 0.04 in.
into products known as corrected-grain (a leather whose grain surface has been abraded to remove or obscure defects) or sueded leathers. Heavier hides may be mechanically split into two layers; the inner layer or split is usually converted to sueded split leather. These grade/type designations, including species of origin and grain integrity, are protected by fair trade rules. Beamhouse operations. The preserved dry skins or hides are rehydrated, and the curing salt is removed by soaking in a paddle vat, a rotary drum, or hide processor, a process requiring up to 12 h. Bactericides are added when the soaking process is longer or if warm water is used. The rehydrated skins or hides are then processed in a saturated lime [Ca(OH)2] solution with caustic sharpeners (alkaline additives), usually sodium sulfide or sodium sulfhydrate. This treatment may either loosen the hair and hair follicles for mechanical removal or pulp the keratinous epidermal tissue so that it may be washed off. Heavy hides may be processed with recovery of salable by-product hair in a 5-day vat process; this is used especially for production of sole leather. Processors containing material with lower ratios of solution to hide by weight and higher concentrations of sharpeners can pulp the hair in a few hours. However, this may require a few hours of additional treatment with lime solution to open up the hide corium to permit proper processing subsequently. Sheepskins are dewooled by painting a thick chemical suspension for hair removal onto the flesh side. Raw material for leathers that retain the original hair or wool, including sheepskin shearlings, is washed carefully and rewetted to preserve the wool or hair in the follicle. Chromium tannage. The beamhouse processes remove keratinous tissues from the hide and initiate the preparation of collagenous fibrous substances in the hide for tannage. Further conditioning of the grain (bating) is accomplished with enzymes; these are usually pancreatic enzymes, although sometimes bacterial or fungal enzymes are used. Deliming salts, most frequently ammonium sulfate or ammonium chloride, are employed to adjust the pH of the bath to the range of 8 to 9. See PH. Chromium tannage requires an acidic pickling process. The pickling solution is a sodium chloride brine solution (using water approximately equal in weight to the hide weight and using NaCl equal to 3% of the weight of the water used) and sufficient sulfuric acid to create an acidity of pH 2–3 in the final pickling bath. Pickling, in addition to being a tannage process step, is also used for temporary storage of the skins or hides. This is an important treatment for sheepskins after wool pulling; pickled sheepskins are shipped from New Zealand to worldwide markets. The tanning material is essentially a complex containing one-third basic chromic sulfate (CrOHSO4). The uptake of the trivalent chromium by the hide substance is moderated by using so-called masking chemicals, often sodium formate. The adsorbed
Leather and fur processing chromium may be fixed onto the hide collagen by pH adjustment (to approximately pH 3.8–4.0), with sodium bicarbonate being a common basification agent. A characteristic manifestation of chromium tannage is the increase in the hydrothermal shrinkage temperature of the hide substance from around 149◦F (65◦C) to temperatures exceeding 212◦F (100◦C). The amount of the chromium tanning complex that is fixed varies, but it usually requires more than 2% fixation (expressed as the Cr ion) on the protein basis. The tanning complex is fixed by collagenous carboxyl groups, partially forming crosslinks in the collagenous molecule. The entire beamhouse and chromium tannage process for cattlehides may be accomplished in a daily cycle; however, a total time of 48 h for both processes often is preferable. Since hides may be very greasy, degreasing agents may be utilized in the beamhouse and tannage processes. These are often aqueous solutions; however, pickled sheepskins are commonly degreased with hydrocarbon solvents. The chromium-tanned stock is blue and is often known as chrome-in-the-blue stock. It is an article of international commerce, and is shipped in the wet stage. While the wet stock is resistant to bacterial deterioration, fungal growth can occur during storage. Hence, fungicides are commonly added to prevent damage. See FUNGISTAT AND FUNGICIDE. Retannage, coloring, and fat-liquoring. The wet chrome-in-the-blue tanned hides are sorted to select the raw material most appropriate for production of the more valuable full-grain leathers. The sorting also groups the stock by hide thickness and surface area to permit the optimal production of the final grain leather thickness and types. The hides are split mechanically, producing a grain layer and a so-called blue drop. After trimming, this inner split layer is usually converted into suede leathers. Retannage processes may include a second trivalent chromium tannage, a natural vegetable tannage, or a variety of organic synthetic tannages. These combined tannage processes permit the tanner to modify the ionic charges on the tanned collagen fibers facilitating the next two steps in this continuous trio of processes. Proper retannage procedures also facilitate the surface buffing process on the dried leather, which is required for the production of correctedgrain leathers. Acidic, basic, direct, and reactive dyestuffs produce the colors required for the many fashion markets served by leather. The colorants are selected and applied either to produce surface color effects or to produce leather with complete cross-sectional color penetration. The levels of consumption of the dyes may vary from 1 to 10% by weight of the wet stock, particularly in suede leathers. See DYEING. Fat-liquoring is a lubrication process that uses oil emulsions. It is designed to produce leathers with the suppleness, temper, and strength needed for the varied end uses. The lubrication oils are most frequently natural oils, of animal (especially marine) and
vegetable origin; however, synthetic lubricants and mineral oils are also utilized to produce desired effects in the final leather. The emulsions may have either anionic, cationic, or nonionic properties that impart different characteristics to the leather. Sulfated oils, bisulfited oils, and soap emulsions are frequently used in commercial fat-liquoring systems. Levels of lubricant additives vary from a few percent to greater than 10% in oily or so-called stuffed leathers. See FAT AND OIL. Retannage, coloring, and fat-liquoring processes are continuous operations carried out in rotary drums. They are accomplished in a few hours, and they are often carried out in warm water to facilitate the process. A tannery may have available several hundred different formulas for these operations in order to produce a wide enough variety of leathers to meet consumer demand. Drying and finishing. The development of the final leather characteristics continues in the tannery processes that complete the conversion of skins or hides to leathers for varied commercial end uses. The first step is the removal of the retained process water, so that the dried leather contains about 12% moisture. Drying processes are designed to produce leathers with the desired feel, surface levelness, and area (these leathers are sold by the square foot). Very soft glove or garment leathers may be dried by slower processes during a hanging period in a drying room with controlled air flow maintained at moderate temperatures. In toggle drying processes, the moist leather is attached to specially designed frames prior to passage through a drying tunnel; this drying process yields dry leather with the desired suppleness and with somewhat increased area yield. Firmer leathers with greater area yield may be affixed upon a drying plate coated with a temporary adhesive in a unit known as a pasted leather tunnel dryer. This process often employs higher drying temperatures and is most often used to produce leather for shoe uppers. Other methods used are vacuum drying processes, radio-frequency drying processes, and dehumidification drying processes. The selection of the drying process is determined by the desired final leather characteristics and leather area yield. There has been considerable development of processes and machinery involving this step in the tannery process. Leather suppleness may be further modified after drying by dry milling in a rotary drum or by mechanical flexing processes. Furthermore, dried raw material whose grain quality is inadequate is buffed with selected abrasive papers to modify the grain enamel surface prior to finishing as corrected-grain leathers. The surface luster of the leather is developed by the application of finishing coats to the grain of the leather. These finishing formulas utilize a filmformer binder, a colorant, a carrier, and additives that promote finish coat adhesion to the leather surface to achieve desired leather surface characteristics. The film-forming binder polymers may include proteins, nitrocellulose, acrylics, or polyurethanes.
747
748
Leather and fur processing The colorants are either dyes or pigments, especially on corrected-grain leathers. Water is the basic carrier of the finishing formulation, although other solvents may be used, especially in lacquer topcoats. The finishing formulations are applied with rotary sprayers or roll coaters, with the skins or hides moving on conveyors. The carrier is removed by evaporation in heated tunnels. These finishing coat applications, which are usually repetitive, are accompanied by mechanical plating operations to facilitate the development of the desired surface luster and appearance. The finished leather is sorted by estimated cutting yield, and the surface area is measured. Packaging and shipment of the finished leather to manufacturers of leather products completes the chromium tannage conversion process. Vegetable tannage. Vegetable materials are the initial tanning materials in approximately 10% of leather production in the United States. Vegetable tanning materials are polyphenolic extracts of barks and woods. The major commercial extracts are quebracho, wattle (mimosa), and chestnut; these materials are all imported from South America, South Africa, and the Mediterranean countries. Vegetabletanned leathers in the United States are primarily heavy cattlehide leathers produced for shoe soles and insoles, belting, strap, and specialty leathers. See WOOD CHEMICALS. Vegetable tannery processes are specialized adaptations of the general tannery processes previously discussed. The vegetable-tanned sole leather process utilizes a special hide segmentation process (Fig. 3). After the beamhouse operation, the bellies are pickled and used for chrome-tanned leathers for work gloves, and the heads may be used for dog bone production. The shoulders are used for specialty leathers, including waist belt and strap leathers. The bends are converted to shoe sole and insole leathers.
head
single shoulder
bend
head
double shoulder
double bend
belly
Fig. 3. Sole leather from cattlehide.
belly
The original vegetable tannage process was a 2to 3-week process in which the hide was sequentially immersed in a series of rocker vats containing progressively increasing tannin strength. This process resulted in incomplete tannin utilization and caused wastewater problems. A more recently developed method uses an initial polymeric phosphate and sulfuric acid pickling bath to create the acidic pH for tannage and to protect the grain of the hide. Vegetable tannage is then carried out in a concentrated tannin solution in a heated rocker-circulator system. This system provides better utilization of the vegetable tannin blend and produces the desired 50– 65% combined tannin, expressed on the collagenous protein content of the leather. The characteristics of solidity and compactness that are necessary to produce good soling leathers are developed further by using a rotary drum known as an oil wheel to apply oils and inorganic salts (for example, epsom salts). The leather is dried very slowly in hanging lofts, and then it is compacted by high-pressure rolling. The leather is not dyed; the grain surface is polished by glazing with a wax emulsion applied to the grain side of the bend. Fur The earliest type of clothing for primitive humans was fur, prized for warmth in cold climates and for exotic beauty, color, and texture in warmer climates. Furs possess the unique characteristics of the keratinous epidermal portion of the skin, with the collagenous corium providing the supporting fibrous matrix that enables fur to be used as a clothing material. Raw materials. The sources of the raw materials are wild and domestic animals. Trappers provide the wild animal skins, which must be preserved under field conditions, sometimes resulting in poor quality. When animals that are bred and raised in captivity, such as mink and rabbit, are produced primarily for their fur skins, better quality control is achieved. Animals whose skins are used for fur include beaver, fox, mink, muskrat, rabbit, sheep, and seal. Mink represents the highest-dollar-volume fur in worldwide trade. Woolskins from domestic sheep, in addition to being the source of shearlings for garments and footwear, are specially processed to produce mouton fur. The skin of the newborn lamb of the hardy fat-tailed karakul sheep is the source of a major commercial fur. Mink are bred to produce special color styles. Since rabbit skins are numerous and inexpensive, special breeds have also been developed to permit production of rabbit furs with varied colors and textures. Fur skins possess characteristic colors and color patterns. The natural fur pigmentation is determined by the presence of melanin, an oxidized amino acid derivative that is synthesized by specialized epidermal cells. The morphology of the epidermal system in fur skins is more complex than that found in raw materials for leather making such as calfskins. The predominant fur of the pelt contains longer hair fibers underlain by a dense, fine, soft undercoat. Some species
Lecanorales also have coarser specialized hairs, including guard hairs. Processing. The fur dresser rehydrates the fur skins carefully to avoid damage to the expensive raw material. Chemical additives are avoided to protect the integrity of the fur. Mechanical action is minimized, especially on fine, thinner skins. The rehydrated, softened skin is then fleshed, using special machines. Careful fleshing is a necessity, since dressing agents must penetrate from the flesh side of the corium and mechanical action must be avoided. Tannage, or dressing, of fur skins may employ processes related to those used in leather tannage. Since garments fashioned of fur are seldom exposed to heavy rain and are not laundered, pickled skins containing salt may be merely lubricated prior to drying and finishing. However, alum or chromium tannage processes and aldehyde (formaldehyde or glutaraldehyde) tannages may be employed to stabilize the corium protein further, providing increased resistance to water and heat. Excess fixation of chromium onto the fur protein can lead to color variations in the fur. Chromium tannages may also interact with bleaching processes and dyeing procedures involving oxidation. Oiling of fur skins is accomplished by applying light oils with low viscosity to the flesh side of the skin. Fat-liquoring operations using oil emulsions in rotary drums would produce fur entanglement and wool felting. Heavier skins may have the oil worked into the skin corium in a device known as a kicker tub; unsaturated oils may be used in this process for lubrication and to produce an oxidized oil tannage. Adjustment of the fur color to produce fashionable colors is a complex process; its use is regulated to assure correctness of labeling. In fur processing, the hydrothermal stabilization of a skin substance is often low; thus removal of excess water requires care. Centrifugal processes may be used, but the corium retains moisture more tenaciously than the fur fibers. Hanging the fur skins to dry under controlled conditions of temperature and humidity in order to avoid damage completes the process. However, if the corium has been dried to a 15% moisture content, the fur will often be overdried, with low flexibility and excess static electric charge. Agitation in a drum containing sawdust is used to equalize moisture content in the corium and fur and to dryclean the fur. After sawdust removal and adjustment of the corium moisture to about 25%, careful stretching or staking develops skin softness. Selective removal of guard hairs and shearing are special mechanical processes that may be used. The fur gloss is developed by ironing in rotary ironing machines. Robert M. Lollar Bibliography. K. Bienkiewicz, Physical Chemistry of Leather Making, 1983; F. O’Flaherty, W. T. Roddy, and R. M. Lollar, The Chemistry and Technology of Leather Manufacturing, Amer. Chem. Soc. Monogr. 134, 4 vols., 1956–1965, reprint 1978; G. C. Moseley, Leather Goods Manufacture: Methods and
Processes, 1986; J. H. Sharphouse, Leather Technician’s Handbook, 1983; T.C. Thorstensen, Practical Leather Technology, 4th ed., 1993.
Lecanicephaloidea An order of tapeworms of the subclass Cestoda. All species are intestinal parasites of elasmobranch fishes. These tapeworms are distinguished by having a peculiar scolex divided into two portions. The lower portion is collarlike and bears four small suckers; the upper portion may be discoid or tentacle-bearing and is provided with glandular structures (see illus.). The scolex is usually buried in the
scolex
neck
0.5 mm
Anterior end of a lecanicephaloid tapeworm.
intestinal wall of the host and may produce local pathology. The anatomy of the segments is very similar to that of the Proteocephaloidea. Some authorities place the lecanicephaloids in the order Tetraphyllidea, to which they are obviously closely related. Essentially nothing is known of the life history of lecanicephaloids. See EUCESTODA; TETRAPHYLLIDEA. Clark P. Read
Lecanorales An order of the Ascolichenes, also known as the Discolichenes. Lecanorales is the largest and most typical order of lichens and parallels closely the fungal order Helotiales. The apothecia are open and discoid, with a typical hymenium and hypothecium. There are four growth forms—crustose, squamulose, foliose, and fruticose—all showing greater variability than any other order of lichens. Reproduction. Details of sexual reproduction are at best poorly known. Meiotic and mitotic stages leading to spore formation seem normal, but earlier stages of fertilization have not been studied. It is not known if fertilization is somatogamous, or if the microconidia act as spermatia and fertilize the egg in the ascogonium. Mature spores burst the ascal wall and are ejaculated from the apothecial disk to heights of several centimeters. When the spores are
749
750
Lectins germinated in pure culture, a distinct though slowgrowing mycelium is often formed. The mycelium has no resemblance to the growth form of the original parent lichen, nor have apothecia or pycnidia ever been seen in culture. Furthermore, it is doubtful whether anyone has ever synthesized a lichen by combining a fungal culture with an alga. In nature vegetative reproduction is highly developed. Isidia, minute coralloid outgrowths of the cortex, and soredia, powdery excrescences from the medulla, are present in many species. They apparently grow into new lichens when dislodged from the thallus. See REPRODUCTION (PLANT). Taxonomy. The Lecanorales is divided into 25 families, about 160 genera, and 8000–10,000 species. Family divisions are based on growth form of the thallus, structure of the apothecia, the species of symbiotic algae present, and spore characters. Species are separated by such characters as isidia, soredia, rhizines, and pores, and by chemistry. The larger families are described below. Cladoniaceae. This family includes the reindeer mosses and cup lichens. The main thallus is a hollow structure in the shape of a stalk, cup, or is richly branched. There are about 200 species of Cladonia; the most highly developed are found in the boreal and arctic regions. Lecanoraceae. The thallus is typically crustose, and the apothecia have a distinct thalloid rim. This family is primarily temperate and boreal. The largest genus, Lecanora, has over 500 species. Lecideaceae. This family differs from the Lecanoraceae in lacking a thalloid rim around the apothecia. It is also very common in temperate and boreal zones and includes one of the largest and most difficult lichen genera to diagnose, Lecidea, with over 1000 species. Parmeliaceae. This family, known as the foliose shield lichens, is common on trees throughout the world. Parmelia has almost 1000 species; Cetraria, a common genus on pine trees, includes the wellknown Iceland moss (Cetraria islandica). Umbilicariaceae. The thallus is circular in outline, quite large, and umbilicate. The family occurs on acidic rocks in temperate, boreal, and arctic regions. The members are known as the rock tripes. Usneaceae. The beard lichens comprise perhaps the outstanding lichen family because of their conspicuous fruticose growth form. The largest genus is Usnea, some species of which grow to 5 ft (1.5 m) in length. Mason E. Hale
Lectins A class of proteins of nonimmune origin that bind carbohydrates reversibly and noncovalently without inducing any change in the carbohydrate. Since their discovery in 1888, lectins have been labeled agglutinins, antibodylike molecules, hemagglutinins, heteroagglutinins, heterophile agglutinins, natural or normal antibodies, protectins, phytohemagglutinins, and receptor-specific proteins, among other names.
The main feature of lectins is their ability to recognize and bind specific carbohydrate structures becoming the translators of the sugar code (the coding of biological information by sugar structures). Lectins bind a variety of cells having cell-surface glycoproteins (carbohydrate-bound proteins) or glycolipids (carbohydrate-bound lipids). The presence of two or more binding sites for each lectin molecule allows the agglutination (clumping) of many cell types, and the agglutination reaction has been used extensively to detect the presence of lectins in extracts from different organisms. However, multivalency (having several sites of attachment) may not be an absolute requirement, even though it is still an important factor for most lectins. Lectins are ubiquitous in nature and may have very different roles according to the organism, tissue, or developmental stage. Their potential ligands, simple or complex carbohydrates, are present in all living cells and in biological fluids, which suggests that protein–carbohydrate interactions constitute basic phenomena common to all organisms. See CARBOHYDRATE; PROTEIN. Distribution. Lectins have been identified in species of virtually all taxa, from viruses and bacteria to vertebrates, and they occur in both prokaryotic and eukaryotic organisms. Agglutinating activity has been found in more than 1000 plant taxa. At present, around 1000 sequences have been determined for 241 different plants. Moreover, most of the best-characterized lectins come from a single family, Leguminosae, and they constitute about 60% of the well-known lectins. Lectins have been found in almost every plant tissue, but their distribution in different organs varies within the various families already studied. They are abundant in seeds (Leguminosae, Euphorbiaceae, Gramineae), fruits, tubers and bulbs (Solanaceae, Liliaceae), roots (Cucurbitaceae, Leguminosae), and stems and leaves (Cactaceae, Orchidaceae). At the subcellular level, they are present in plant cell walls of almost all tissues, and may have a fundamental role in the assembly of cell-wall polysaccharides and glycoproteins. See PLANT METABOLISM. Recent years have seen an explosion of interest and research in animal lectins. Good examples are galectins (galactose-binding lectins); calciumdependent (C-type) lectins, which include selectins (membrane-bound lectins specific for leukocyte adhesion) and collectins (mannose-specific lectins); and annexin proteins with affinity for lipids and sugars present in almost all organisms. Vertebrate lectins occur as soluble or integral membrane proteins in embryonic and adult fluids, organs, and tissues. Invertebrate lectins occur in body fluids or secretions, such as fish serum, snake venom, seminal and coelomic fluids, and hemolymph. Microbial lectins have been isolated mainly from bacteria, but they are also found in viruses, slime molds, protozoa, green algae, and fungi. In many cases the microbial lectins are bound to the bacterial surface and have only one recognition site. The bacteria have the ability to agglutinate other cells, but the isolated lectin does not
Lectins and is referred to as lectinlike protein. Initially, lectins were classified according to their saccharide specificity. However, the introduction of the techniques of molecular biology has changed lectin classification. The homology in the structure of the carbohydrate recognition domain allowing for the drawing of genealogical trees has replaced this criterion. Lectins are classified based on their amino acid sequence homology and relatedness. The crystal structures of more than 500 lectins of different origin have been determined, and a very complete database is available with the known three-dimensional structures. Molecular properties. The molecular structures of lectins from different organisms show no general pattern. This is not surprising if one considers that plants and animals have undergone long evolutionary divergence. Carbohydrate-binding molecules appear to have arisen at different times and are products of very different evolutionary histories. Lectins can be monomeric or oligomeric. The latter are usually composed of two to four polypeptide chains and have one binding site per chain. However, they are very different in their chemical and physical properties. Most have covalently linked carbohydrates, which account for 50% in the potato lectin and only 0.5% in the lectin of the sponge Axinella polypoides. Some, like the jack bean lectin, concanavalin A, are completely devoid of sugars. The molecular weights of the native oligomers range from 8500 for the lectin of the stinging nettle (Urtica dioica) to 600,000 for the lectin from the tunicate Halocynthia roretzi. On the other hand, lectins from species that are closely related taxonomically exhibit high homology of the gene sequences encoding for those proteins, as has been shown for animal lectins and legume or cereal lectins. Some lectins require calcium, as in the animal C-type lectins; others require divalent cations (calcium, magnesium, or manganese) to bind their carbohydrate ligands, as in legume lectins. Other lectins do not require divalent cations for binding activity and instead call for the presence of thiol groups, as in the wheat germ agglutinin or the soluble vertebrate lectins (galectins). The sugar specificity of a lectin is defined by the carbohydrate for which it shows the highest affinity, which means that lectins considered specific for one monosaccharide may also bind other structurally related carbohydrates having a lower affinity. In general, lectins interact with the nonreducing glycosyl groups of polysaccharides and glycoproteins, but some can bind internal sugars or sugars at the reducing end. Some lectins can recognize one particular monosaccharide, which suggests the presence of a small binding site, whereas others bind preferentially to trisaccharides or tetrasaccharides and have an extended binding site. Lectins binding oligosaccharides are thought to be more specific for their ligand than those binding monosaccharides. Another important property of some lectins is their ability to induce cell proliferation of particular tissues or cells. Such mitogenic lectins are used extensively in biomedical research.
It seems clear that a wide variety of proteins share the ability of lectins to bind carbohydrates. Lectins from closely related organisms probably stem from common ancestors and have high degrees of homologies. That is particularly clear in the relatively less complex structure of plant lectins. However, lectins related to a specific function may or may not have structural homologies, depending on their evolutionary history. In addition, the more complex structure of vertebrate lectins indicates that they are built with different polypeptide domains that present homologies to many unrelated proteins. The carbohydrate recognition domain is the only one offering a certain degree of homology. Biological functions. From the known examples, it seems clear that lectins are involved in recognition phenomena, and their ability to bind particular carbohydrate structures is the key to their biological functions. These recognition functions include their involvement in interactions with cells or extracellular materials from the same organism (which could be considered self-recognition, or recognition of endogenous ligands) and interactions with foreign particles or cells (recognition of non-self, or recognition of exogenous ligands). See IMMUNOLOGICAL PHYLOGENY. Animal lectins are involved in a variety of cellular processes, including intracellular transport and extracellular assembly, cell–cell and cell–matrix interactions, cell growth control and apoptosis (programmed cell death), and immune functions. Microbial lectins largely function in host cell attachment, tissue colonization, and invasion. Plant lectins have been known for more than 100 years. Although their biological roles are not fully understood, the plant lectins are involved in deposition of storage proteins, maintenance of seed dormancy, defense against pathogen and animal predators, symbiosis, transport of carbohydrates, mitogenic stimulation of embryonic plant cells, assembly and elongation of cell walls, and recognition of pollen. Lectins as tools. Lectins are very useful reagents for the study of complex carbohydrates and cell surfaces, for the separation and identification of particular cells, and for the stimulation of cell proliferation. Lectins are the main tools for glycan (polysaccharide) profiling aiming at comprehensive elucidation of glycan functions in all organisms. Lectins covalently attached to insoluble matrices are used to separate glycoproteins or glycopeptides that contain different carbohydrates. Labeled lectins are also used in histochemical and cytochemical studies to localize glycoconjugates that carry particular sugars. This technique is particularly interesting, since changes in lectin-binding patterns occur during embryonic differentiation, malignant transformation, aging, and some pathological conditions. The spectrum of uses for lectins as specific reagents continues to expand. Rafael Pont-Lezica Bibliography. D. C. Kilpatrick, Handbook of Animal Lectins: Properties and Biomedical Applications, Wiley, 2000; N. Sharon and H. Lis, Lectins, 2d ed., Kluwer Academic, 2003.
751
752
Lecythidales
Lecythidales An order of flowering plants, division Magnoliophyta (Angiospermae), in the subclass Dilleniidae of the class Magnoliopsida (dicotyledons). The order consists of the single family Lecythidaceae, with about 400 species. They are tropical, woody plants with alternate, entire leaves, valvate sepals, separate petals, numerous centrifugal stamens, and a syncarpous, inferior ovary with axile placentation. Brazil nuts are the seeds of Bertholletia excelsa, a member of the Lecythidaceae. See BRAZIL NUT; DILLENIIDAE; MAGNOLIOPHYTA; PLANT KINGDOM. Arthur Cronquist
Legendre functions Solutions to the differential equation (1 − x2)y − 2xy + v(v + 1)y = 0. Legendre polynomials. The most elementary of the Legendre functions, the Legendre polynomial Pn(x) can be defined by the generating function in Eq. (1). More explicit representations are Eq. (2), and the hypergeometric function, Eq. (3). (1 − 2xr + r2 )−1/2 =
∞
Pn (x)rn
(1)
n=0
Pn (x) =
(−1)n d n (1 − x2 )n 2n n! dxn
Pn (x) = 2 F1 [−n, n + 1; 1; (1 − x)/2]
(2) (3)
See HYPERGEOMETRIC FUNCTIONS. Generating function (1) implies Eq. (4). The func-
Relation to trigonometric functions. When x = cos θ , Legendre polynomials have a relation with trigonometric functions, given by Eq. (5), where |R(n,θ )| < = 1/2 1 (sin θ)1/2 n + Pn (cos θ) 2
=
2 cos π
1 π n+ θ− + R(n, θ ) 2 4
A/(n sin θ), A being a fixed constant, when c/n < =θ < π − c/n for c > 0. See TRIGONOMETRY . = Properties. A graph of Pn(x) [Fig. 1] illustrates a number of properties. Pn(x) is even or odd as n is even or odd, that is, Pn(−x) = (−1)nPn(x). All the zeros of Pn(x) are real and lie between −1 and 1. The zeros of Pn+1(x) separate the zeros of Pn(x). Pn(x) satisfies the inequality |Pn(x)| < = 1, −1 < =x< = 1. The successive relative maxima of |Pn(x)| increase in size as x increases over the interval 0 < =x< = 1. The closest minimum to x = 1 of Pn(x) called µ1,n, satisfies µ1,n < µ1,n+1. Similar inequalities hold for the first maxima of Pn(x) to the left of x = 1, and for the kth relative maxima or minima. All of these results have been used in applications. The last two results about the relative maxima were used to obtain bounds on the phase of scattering amplitudes. From an important limiting relation of F. G. Mehler, Eq. (6), it is easy θ lim Pn cos = J0 (θ) (6) n→∞ n to show that µ1,n approaches the minimum value of the Bessel function J0. See BESSEL FUNCTIONS.
(a2 − 2ar cos θ + r2 )−1/2 ∞ 1 = Pn (cos θ) (r/a)n a n=0
(5)
Pn (x)
(4)
1
0 0 and a diverging lens if φ < 0. When φ = 0, the lens is afocal. Several types of collecting and diverging lenses are shown in Fig. 1.
Lens (optics) (a)
(b)
(c)
collecting or positive lenses
(d)
(e)
(f)
diverging or negative lenses
Fig. 1. Common lenses. (a) Biconvex. (b) Plano-convex. (c) Positive meniscus. (d) Biconcave. (e) Plano-concave. (f) Negative meniscus. (After F. A. Jenkins and H. E. White, Fundamentals of Optics, 3d ed., McGraw-Hill, 1957)
The surfaces of most lenses are either spherical or planar, but nonspherical surfaces are used on occasion to improve the corrections without changing the power of the lens. See OPTICAL SURFACES. A concentric lens is a lens whose two surfaces have the same center. If the object to be imaged is also at the center, its axis point is sharply imaged upon itself, and since the sine condition is fulfilled, the image is free from asymmetry. Such a lens can be used as an additional system to correct meridional errors. Another type of lens consists of an aplanatic surface followed by a concentric surface, or vice versa. Such a lens divides the focal length of the original lens to which it is attached by n2, thus increasing the f-number by a factor of n2 without destroying the axial correction of the preceding system. It does introduce curvature of field which makes a rebalancing of the whole system desirable. See ABERRATION (OPTICS). Cemented lenses. Consider a compound lens made of two or more simple thin lenses cemented together. Let the power of the kth simple lens be k and its Abbe value ν k. The difference between the powers of the combination for wavelengths corresponding to C and F is given by Eq. (3) where N may be considered F − C =
k = N νk
(3)
to be the effective ν-value of the combination. The ν-values of optical glasses vary between 25 and 70, with the ν-value of fluorite being slightly larger (ν = 95.1). By using compound lenses, effective values of N can be obtained outside this range. Color correction is achieved as N becomes infinite, so that F − C = 0. A lens so corrected is called an achromat. In optical design, it is sometimes desirable to have negative values of N to balance the positive values of the rest of the system containing collecting lenses. Such a lens is said to be hyperchromatic. A cemented lens corrected for more than two colors is said to be apochromatic. A lens corrected for all colors of a sizable wavelength range is called a superachromatic lens. See CHROMATIC ABERRATION; OPTICAL MATERIALS. Lens Systems Optical systems may be divided into four classes: telescopes, oculars (eyepieces), photographic objec-
tives, and enlarging lenses. See EYEPIECE; OPTICAL MICROSCOPE. Telescope systems. A lens system consisting of two systems combined so that the back focal point of the first (the objective) coincides with the front focal point of the second (the ocular) is called a telescope. Parallel entering rays leave the system as parallel rays. The magnification is equal to the ratio of the focal length of the first system to that of the second. If the second lens has a positive power, the telescope is called a terrestrial or keplerian telescope and the separation of the two parts is equal to the sum of the focal lengths. If the second lens is negative, the system is called a galilean telescope and the separation of the two parts is the difference of the absolute focal lengths. The galilean telescope has the advantage of shortness (a shorter system enables a larger field to be corrected); the keplerian telescope has a real intermediate image which can be used for introducing a reticle or a scale into the intermediate plane. Both objective and ocular are in general corrected for certain specific aberrations, while the other abberations are balanced between the two systems. See TELESCOPE. Photographic objectives. A photographic objective images a distant object onto a photographic plate or film. See PHOTOGRAPHY. The amount of light reaching the light-sensitive layer depends on the aperture of the optical system, which is equivalent to the ratio of the lens diameter to the focal length. Its reciprocal is called the f-number. The smaller the f-number, the more light strikes the film. In a well-corrected lens (corrected for aperture and asymmetry errors), the f-number cannot be smaller than 0.5. The larger the aperture (the smaller the f-number), the less adequate may be the scene luminance required to expose the film. Therefore, if pictures of objects in dim light are desired, the f-number must be small. On the other hand, for a lens of given focal length, the depth of field is inversely proportional to the aperture. Since the exposure time is the same for the center as for the edge of the field, it is desirable for the same amount of light to get to the edge as gets to the center, that is, the photographic lens should have little vignetting. The camera lens can be considered as an eye looking at an object (or its image), with the diaphragm corresponding to the eye pupil. The gaussian image of the diaphragm in the object (image) space is called the entrance (exit) pupil. The angle under which the object (image) is seen from the entrance (exit) pupil is called the object (image) field angle. For most photographic lenses, the entrance and exit pupils are close to the respective nodal points; for such lenses, the object and the image field angles are equal. In general, photographic objectives with large fields have small apertures: those with large
759
760
Lens (optics)
(a)
(b)
(c)
(d)
(e) Fig. 2. Older cameralenses. (a) Meniscus. (b) Simple achromat. (c) Periskop. (d) Hypergon wide-angle. (e) Symmetrical achromat.
apertures have small fields. The construction of the two types of systems is quite different. One can say in general that the larger the aperture, the more complex the lens system must be. There exist cameras (so-called pinhole cameras) that do not contain any lenses. The image is then produced by optical projection. The aperture in this case should be limited to f/22. Other types of lenses. A single meniscus lens, with its concave side toward the object and with its stop in front at its optical center, gives good definition at f/16 over a total field of 50◦ (Fig 2a). The lens can be a cemented doublet for correcting chromatic errors (Fig. 2b). For practical reasons, a reversed meniscus with the stop toward the film is often used.
(a)
(b)
(c)
Fig. 3. Types of anastigmats. (a) Celor. (b) Tessar. (c) Dagor.
(a)
(b)
(c)
Fig. 4. Modern camera lenses. (a) Sonnar. (b) Biotar. (c) Topogon.
Combining two meniscus lenses to form a symmetrical lens with central stop makes it possible to correct astigmatic and distortion errors for small apertures as well as large field angles (Fig. 2c). The basic type of wide-angle objective is the Hypergon, consisting of two meniscus lenses concentric with regard to the stop (Fig. 2d). This type of system can be corrected for astigmatism and field curvature over a total field angle of 180◦ but it can only be used for a small aperture (f/12), since it cannot be corrected for aperture errors. The aperture can be increased to f/4 at the expense of field angle by thickening and achromatizing the meniscus lenses and adding symmetrical elements in the center or at the outside of the basic elements. Two positive achromatic menisci symmetrically arranged around the stop led to the aplanatic type of lens (Fig. 2e). This type was spherically and chromatically corrected. Since the field could not be corrected, a compromise was achieved by balancing out sagittal and meridional field curvature so that one image surface lies in front and the other in back of the film. Anastigmatic lenses. The discovery of the Petzval condition for field correction led to the construction of anastigmatic lenses, for which astigmatism and curvature of field are corrected. Such lenses must contain negative components. The Celor (Gauss) type consists of two air-spaced achromatic doublets, one on each side of the stop (Fig. 3a). The Cooke triplet combines a negative lens at the aperture stop with two positive lenses, one in front and the other in back. It is called a Tessar (Fig. 3b) if the last positive lens is a cemented doublet, or a Heliar if both positive lenses are cemented. The Dagor type consists of two lens systems that are nearly symmetrical with respect to the stop, each system containing three or more lenses (Fig. 3c). Modern lenses. To increase the aperture, the field, or both, it is frequently advantageous to replace one lens by two separated lenses, since the same power is then achieved with larger radii and this means that the single lenses are used with smaller relative apertures. The replacing of a single lens by a cemented lens changes the color balance, and thus the designer may achieve more favorable conditions. Moreover, the introduction of new types of glass (first the glasses containing barium, later the glasses containing rare earths) led to lens elements which for the same power have weaker surfaces and are of great help to the lens designer, since the errors are reduced. Of modern designs the most successful are the Sonnar, a modified triplet, one form of which is shown in Fig. 4a; the Biotar (Fig. 4b), a modified Gauss objective with a large aperture and a field of about 24◦; and the Topogon (Fig. 4c), a periscopic lens with supplementary thick menisci to permit the correction of aperture aberrations for a moderate aperture and a large field. One or two
Lentil plane-parallel plates are sometimes added to correct distortion. Special objectives. It is frequently desirable to change the focal length of an objective without changing the focus. This can be done by combining a fixed near component behind the stop with an exchangeable set of components in front of the stop. The designer has to be sure that the errors of the two parts are balanced out regardless of which front component is in use. For modern ways to change the magnification see ZOOM LENS The telephoto objective is a specially constructed objective with the rear nodal point in front of the lens, to combine a long focal length with a short back focus. See TELEPHOTO LENS. The Petzval objective is one of the oldest designs (1840) but one of the most ingenious. It consists in general of four lenses ordered in two pairs widely separated from each other. The first pair is cemented and the second usually has a small air space. For a relatively large aperture, it is excellently corrected for aperture and asymmetry errors, as well as for chromatic errors and distortion. It is frequently used as a portrait lens and as a projection lens because of its sharp central definition. Astigmatism can be balanced but not corrected. Enlarger lenses and magnifiers. The basic type of enlarger lens is a holosymmetric system consisting of two systems of which one is symmetrical with the first system except that all the data are multiplied by the enlarging factor m. When the object is in the focus of the first system, the combination is free from all lateral errors even before correction. A magnifier in optics is a lens that enables an object to be viewed so that it appears larger than its natural size. The magnifying power is usually given as equal to one-quarter of the power of the lens expressed in diopters. See DIOPTER; MAGNIFICATION. Magnifying lenses of low power are called reading glasses. A simple planoconvex lens in which the principal rays are corrected for astigmatism for a position of the eye at a distance of 10 in. (25 cm) is well suited for this purpose, although low-power magnifiers are often made commercially with biconvex lenses. A system called a verant consists of two lenses corrected for color, astigmatism, and distortion. It is designed for stereoscopic vision at low magnification. See STEREOSCOPY. For higher magnifications, many forms of magnifiers exist. One of the basic designs has the form of a full sphere with a diaphragm at the center, as shown in Fig. 5a. The sphere may be solid or it may be filled with a refracting liquid. When it is solid, the diaphragm may be formed by a deep groove around the equator. Combinations of thin planoconvex lenses as shown in Fig. 5b and c are much used for moderate powers. Better correction can be attained in the aplanatic magnifier of C. A. Steinheil, in which a biconvex crown lens is cemented between a pair of flint lenses (Fig. 5d). A design by C. Chevalier (Fig. 5e) aims for a large
crown
(a)
(b)
(c)
(d)
761
flint
(e)
Fig. 5. Typical magnifiers. (a) Sphere with equatorial diaphragm. (b,c) Planoconvex lens combinations. (d) Steinheil triple aplanat. (e) Chevalier type.
object distance. It consists of an achromatic negative lens combined with a distant collecting front lens. A magnifying power of up to 10× with an object distance up to 3 in. (75 mm) can be attained. Max Herzberger Bibliography. E. Hecht and A. Zajac, Optics, 3d ed., 1997; M. Herzberger, Modern Geometrical Optics, 1958, reprint 1980; F. A. Jenkins and M. E. White, Fundamentals of Optics, 4th ed., 1976; R. Kingslake, R. R. Shannon, and J. C. Wyant (eds.), Applied Optics and Optical Engineering, 15 vols., 1965–2000; D. Malacara and Z. Malacara, Handbook of Lens Design, 1994; S. F. Ray, Applied Photographic Optics, 2d ed., Focal Press, 1996; W. J. Smith, Modern Lens Design: A Resource Manual, 1992.
Lentil A semi-viny annual legume with slender tufted and branched stems 18–22 in. (46–56 cm) long. The lentil plant (Lens esculenta) was one of the first plants brought under cultivation. They have been found in the Bronze Age ruins of the ancient lake dwellings of St. Peter’s Island, Lake of Bienne, Switzerland. Lentils have been discovered in Poland dating back to the Iron Age. In the Bible the “red pottage” for which Esau gave up his birthright to his brother, Jacob, was probably lentil soup. Large-seeded lentils originated in the Mediterranean region; medium-sized lentils originated in the inner mountains of Asia Minor; and Afghanistan was the original home of the smallest-seeded lentils. See LEGUME. Production. The world’s lentil production is centered in Asia, with nearly two-thirds of the production from India, Pakistan, Turkey, and Syria. Whitman and Spokane counties in Washington, and Latah, Benewah, and Nez Perce counties in Idaho grow about 95% of the lentils produced in the United States. Description. Lentil leaves are pinnately compound and generally resemble vetch. The plant has tendrils similar to those of pea plants. The seeds grow in short broad pods, each pod producing two or three thin lens-shaped seeds (see illus.). Seed color varies from yellow to brown and may be mottled, although mottled seeds are not desirable for marketing. The lentil flowers, ranging in color from white to light purple, are small and delicate and occur at different locations on the stems.
762
Lenz’s law
Lens-shaped lentil seeds.
Lentil seed is used primarily for soups but also in salads and casseroles. Lentils are more digestible than meat and are used as a meat substitute in many countries. Culture. Lentils require a cool growing season; they are injured by severe heat. Therefore, they are planted in April, when soil moisture is adequate and temperatures are cool. A fine firm seedbed is required; the land is usually plowed in the fall and firmed by cultivation in the spring before seeding. Lentils are usually planted in rotation with winter wheat. Seeds are planted in 7–12 in. (21–30 cm) rows at depths of 1/2 to 1 in. (13 to 25 mm), on an average of 60–75 lb/acre (67–84 kg/hectare). Applications of sulfur, molybdenum, and phosphorus are used to increase yields. Wild oats often infest lentil fields, but chemicals are available to aid weed control. Cowpea and black-bean aphids are the two most important insect pests on lentils. Predators such as ladybird beetles and syrphid-fly larvae usually keep these insects under control, but when their populations are too low to control the aphids, chemical insecticides are used. Harvesting. Lentils are mowed or swathed when the vines are green and the pods have a golden color. About 10 days later, lentils are ready to combine harvest using the same combines that are used for wheat, oats, and barley. A pick-up attachment picks up the material from the windrows. The combines must be operated at a maximum speed of 11/2 mi/h (0.7 m/s) to prevent loss of, and damage to, the lentil seed. Average yields are about 900 lb/acre (1000 kg/ha). See AGRICULTURAL MACHINERY; AGRICULTURAL SOIL AND CROP PRACTICES. Kenneth J. Morrison Bibliography. O. N. Allen and E. K. Allen, The Leguminosae: A Source Book of Characteristics, Uses, and Nodulation, 1981; R. J. Summerfield and A. H. Bunting (eds.), Advances in Legume Science, vol. 1, 1980.
Lenz’s law A law of electromagnetism which states that, whenever there is an induced electromotive force (emf) in a conductor, it is always in such a direction that
the current it would produce would oppose the change which causes the induced emf. If the change is the motion of a conductor through a magnetic field, as in the illustration, the induced current must be in such a direction as to produce a force opposing the motion. If the change causing the emf is a change of flux threading a coil, the induced current must produce a flux in such a direction as to oppose the change. That is, if the change is an increase of flux, the flux due to the induced current must be opposite in direction to the increasing flux. If the change is a decrease in flux, the induced current must produce flux in the same direction as the decreasing flux.
south pole A
motion
B
current
magnetic field
north pole
Induced emf in a moving conductor. Direction of current induced in wire AB is indicated by the arrows. ( After M. W. White, K. V. Manning and R. L. Weber, Practical Physics, 2d ed., McGraw-Hill, 1955)
Lenz’s law is a form of the law of conservation of energy, since it states that a change cannot propagate itself. See CONSERVATION OF ENERGY; ELECTROMAGNETIC INDUCTION; ELECTROMOTIVE FORCE (EMF). Kenneth V. Manning
Leo The Lion, a northern zodiacal constellation (see illustration). Its head and front legs are marked by stars tracing out a backward question mark (without a bottom dot), also known as the Sickle. The bright star Regulus is at the bottom of the Sickle. The end of Leo’s tail is marked by the bright star Denebola; its name is Arabic for lion’s tail. See ZODIAC. Leo is in the evening sky during the spring and early summer. The Leonid meteor shower, which appears to emanate from Leo each November 17, is especially outstanding every 33 years, most recently in 1999–2001, so it will next be prominent starting around 2032. See METEOR. The modern boundaries of the 88 constellations,
Lepidodendrales right ascension 11h
12h
10h
URSA MAJOR
LYNX
Magnitudes: 1.0–1.5
+30°
LEO MINOR CANCER
ζ δ
+20°
2.0–2.5
κ
µ ε
The Sickle
1.5–2.0
2.5–3.0 3.0–3.5
λ
3.5–4.0
γ
4.0–4.5
declination
4.5–5.0
LEO
ϑ
β
Regulus
ι
σ
VIRGO
ρ
χ
5.5–6.0
ψ
Denebola +10°
5.0–5.5
η
Variable
e
α ν
R
ξ ω
ο
π
VY
τ
HYDRA
SEXTANS 0°
c
ti clip
υ ϕ
CRATER
Alphard ◦
Modern boundaries of the constellation Leo, the Lion. The celestial equator is 0 of declination, which corresponds to ◦ celestial latitude. Right ascension corresponds to celestial longitude, with each hour of right ascension representing 15 of arc. Apparent brightness of stars is shown with dot sizes to illustrate the magnitude scale, where the brightest stars in the sky are 0th magnitude or brighter and the faintest stars that can be seen with the unaided eye at a dark site are 6th magnitude. (Wil Tirion)
including this one, were defined by the International Astronomical Union in 1928. See CONSTELLATION. Jay M. Pasachoff
Lepidodendrales An extinct order of the class Lycopsida which, together with Isoetales, forms the monophyletic rhizomorphaleans, the most derived and diverse group of clubmosses. The lepidodendraleans are best known as the scale trees that dominate most reconstructions of Upper Carboniferous swamps, where they were the main constituent of many coal-forming peats. Lepidodendraleans are represented most frequently in the fossil record as characteristic “tiretrack” bark fragments, which reflect the regular geometric arrangements of leaves, although taxonomic revisions have focused on the more complete information obtained from fossils preserved threedimensionally in petrified peats. Reconstruction of the plants from their component organs has revealed that the classic trees (Fig. 1) possessed a wide range of growth architectures and shared lowland habitats with other smaller-bodied genera, reflecting a major radiation that occurred in the Late Devonian and perhaps Early Carboniferous. Rhizomorphic lycopsids are distinguished by their centralized rootstock (rhizomorph), which permitted bipolar growth; both aerial and subterranean
axes could branch, unlike the unipolar rhizomes of other, more primitive lycopsids. This trait and the newly acquired ability to produce wood allowed many rhizomorphic lycopsids to form large upright trees, with trunks up to 115 ft (35 m) tall and 3 ft (1 m) in diameter. The tree-lycopsids relied primarily on the external cylinder of bark for support rather than the relatively narrow central cylinder of wood, which was adapted primarily for water transport. Photosynthesis occurred largely in the leaf bases and bark, as well as in the narrow, grasslike leaf laminae. The lepidodendraleans consisted of four main growth modules: rhizomorph, stem, lateral branches, and crown branches. Only one of the two branch types was well expressed in any one genus. The growth of the tree-lycopsids was probably very rapid. Trees with disposable lateral branches reached reproductive maturity earlier than those that relied only on crown branches. However, unlike most seed plants, branches functioned primarily for reproduction and propagule dispersal rather than light capture; even when mature, they formed open canopies that cast little shade. The evolution of the group demonstrably entailed progressively increased reproductive sophistication. The more primitive genera possessed bisporangiate cones resembling those of the Selaginellales (Fig. 2). Typically, megasporangia containing “female” megaspores developed at the base of the cone where greater resources were available, and
763
764
Lepidolite
Fig. 1. Reconstructions of tree-sized Carboniferous rhizomorphic tree-lycopsids, illustrating architectural variation. (After R. M. Bateman et al., Experimental cladistic analysis of anatomically preserved arborescent lycopsids from the Carboniferous of Euramerica, Ann. Mo. Bot. Gard., 79:500–559, 1992)
microsporangia containing “male” microspores developed closer to the cone apex. The genus Sigillaria is characterized by the segregation of the two genders of sporangium in different cones, allowing modifications of the female cone without necessarily modifying the male cone. All subsequent genera underwent selective abortion to leave only one functional megaspore in each megasporangium, now having thinner walls and lacking external ornamenmicrosporangium microspores
sporophyll
ligule
vascular supply of cone axis
megaspores
megasporangium Fig. 2. Primitive bisexual cone of the Flemingites type, borne by Paralycopodites, with apically concentrated microsporangia and basally concentrated megasporangia containing several megaspores. (After C. A. Arnold, An Introduction to Paleobotany, McGraw-Hill, 1947)
tation since it could no longer serve any function. The sporophylls then enclosed the megasporangium to form dispersal units that were the largest of any pteridophyta. Termed aquacarps, they typify Lepidophlois and Miadesemia. See SELAGINELLALES. These reproductive changes were accompanied by vegetative modifications. Notably, stelar morphology and leaf base anatomy (the basis of a separate organ taxonomy for bark) became more complex, particularly in the larger genera, and ligules became recessed in pits. In contrast, no such progressive increase in evolutionary sophistication occurred in the gross vegetative morphology of the plants. Frequent, profound, and broadly heterochronic changes in body size favored transitions from trees to shrubs and pseudoherbs (plants that were woody and determinate but small bodied and recumbent). The lepidodendraleans partitioned and occupied most potential ecological niches, at least in the extensive European-American swamps of the Late Carboniferous. The warming and drying of the global climate toward the end of this period greatly reduced the diversity of the rhizomorphic lycopsids, beginning with the most derived and specialized genera such as Lepidophloios and eventually leaving only the increasingly reduced and ecologically specialized isoetaleans to survive to present day. See CARBONIFEROUS; ISOETALES; LYCOPHYTA; LYCOPSIDA. Richard M. Bateman; William A. DiMichele Bibliography. R. M. Bateman, Evolutionary-developmental change in the growth architecture of fossil mizomorphic lycopsids, Biol. Rev., 69:527–597; T. L. Phillips and W. A. DiMichele, Comparative ecology and life-history biology of arborescent lycopsids, Ann. Mo. Bot. Gard., 79:560–588, 1992; W. N. Stewart and G. W. Rothwell, Paleobotany and the Evolution of Plants, 2d ed., 1993.
Lepidolite A mineral of variable composition which is also known as lithium mica and lithionite, K2(Li,Al)5−6(Si6−7,Al2−1)O20−21(F,OH)3−4. Rubidium (Rb) and cesium (Cs) may replace potassium (K); small amounts of Mn, Mg, Fe(II), and Fe(III) normally are present; and the OH/F ratio varies considerably. Polithionite is a silicon- and lithium-rich, and thus aluminum-poor, variety of lepidolite. Lepidolite is uncommon, occurring almost exclusively in structurally complex granitic pegmatites, commonly in replacement units. Common associates are quartz, cleavelandite, alkali beryl, and alkali tourmaline. Lepidolite is a commercial source of lithium, commonly used directly in lithium glasses and other ceramic products. Important deposits occur in the Karibib district of South-West Africa and at Bikita, Rhodesia. The structural modifications show some correlation with lithium content: the six-layer monoclinic form contains 4.0–5.1% Li2O; the one-layer monoclinic, 5.1–7.26% Li2O. A three-layer hexagonal form is also found. There is a compositional gradation to muscovite, intermediate types being called lithian
Lepidoptera
2 cm Group of lepidolite crystals found in Zinnwald, Czechoslovakia. (Specimen from Department of Geology, Bryn Mawr College)
muscovite, containing 3–4% Li2O, and having a modified two-layer monoclinic muscovite structure. Lepidolite usually forms small scales or finegrained aggregates (see illus.). Its colors, pink, lilac, and gray, are a function of the Mn/Fe ratio. It is fusible at 2, yielding the crimson (lithium) flame. It has a perfect basal cleavage. Hardness is 2.5–4.0 on Mohs scale; specific gravity is 2.8–3.0. See MICA; SILICATE MINERALS. E. William Heinrich
Lepidoptera The order of scaly-winged insects, including the butterflies, skippers, and moths. One of the largest orders in the class Insecta, the Lepidoptera include over 150,000 known species (of which about 10,000 occur in North America) divided among more than 100 families. The adults have a covering of hairs and flattened setae (scales) on the wings, legs, and body, and are often beautifully colored. With minor exceptions, the adults are also characterized by two pairs of membranous wings, and sucking mouthparts that feature a prominent coiled proboscis. This feeding apparatus is formed from a pair of specially elongated and grooved lobes, the galeae (a mouthpart), that are closely joined along their length to make a flexible tube. Adults having a proboscis (the vast majority) can take only liquid food, such as nectar and juices of fruits. Butterflies and skippers usually fly in the daytime, while most moths are nocturnal. The caterpillars are almost always herbivorous and chew their food like a grasshopper or beetle. See INSECTA. Morphology. The head of most adult Lepidoptera species is dominated by the bulbous compound eyes (Fig. 1a), as seen in other insects. The most unusual feature of most moths and butterflies, however, is the form of the mouthparts (Fig. 1a–c). The proboscis is thought to be extended by blood pressure created by retraction of special muscles at its base. Other muscles, arranged diagonally within each half of the proboscis, appear to be necessary for the whole proboscis to coil. Liquid is drawn up into the proboscis by the action of a muscular sucking pump inside the insect’s head (Fig. 2a). Simple eyes (ocelli) located near the top of the head (Fig. 1b) are absent in many groups, such as the Hesperioidea and Papilionoidea.
The antennae are variable in form (Fig. 3). Thorax. The prothorax, the first thoracic segment, is well developed in some lower groups, such as the Hepialoidea, but it is considerably smaller than the second and third pterothoracic, or wing-bearing, segments, and is largely membranous in the majority of Lepidoptera. The most prominent feature of the dorsum of the prothorax, in most groups, is a pair of protuberant lobes, the patagia. The paired basal segments (coxae) of the prothoracic legs are cylindrical and functional, while the two pairs of coxae of the pterothorax (mesothorax and metathorax) are fused with the thoracic capsule and are thus immobile. In each segment of the thorax, the midventral suture, the discrimen, is represented internally by a fine membranous lamella. The form of this lamella and its relationship to the furca (a forked internal projection arising from the base of the thorax), especially in the mesothorax, have proved significant for the classification of some Lepidoptera, including butterflies. The mesothorax is the largest of the three segments and may completely overlap the metathorax. There is considerable variation throughout the order in the details of the many different sclerites (the hardened plates that make up the body wall) and their connecting sutures. In the butterflies, interesting variation has also been found in the structure of the gut, but this has not been studied extensively in the rest of the order. Many questions remain unanswered concerning the morphology, function, and evolution of this fundamental body system. Scales. The wing scales are very variable in form (Fig. 1d). Generally, they are flat, thin, cuticular sacks with striated upper surfaces. They have a basal stalk, the pedicel, that fits into a socket in the wing membrane. The males of many species have special scales called androconia. These have feathered tips (Fig. 1e) or various other forms, and are often gathered together as special androconial organs. Their usual function is to disseminate scents, notably those used by the males during courtship. The vast spectrum of colors seen in the Lepidoptera can be grouped into two sorts, pigmentary and structural. The former result from various chemical pigments, such as melanin (black/brown/orange) or papiliochrome (yellow), deposited in the scales. Structural colors are produced either by fine diffracting ridges on the surface of the scales or, more usually, by layers within the scales that are so close together they interfere with light. These diffraction or interference colors are most often metallic or iridescent greens and blues, but the remarkable cryptic green of some hairstreak butterflies (for example, Callophrys rubi) is structural, yet totally noniridescent. In some Lepidoptera, blue bile pigments deposited between the wing membranes also contribute to the overall color pattern, as in species of Graphium (Papilionidae). Wings. In most moths, the fore- and hindwings on each side are coupled together. In females this is achieved by a group of stiff bristles or setae, or
765
766
Lepidoptera
labial palpus patagium tegula compound eye
scutum II
5 mm scutellum II scutum III
spiracle scutellum III
pleuron I
3
2
1
4
6 7
8 9 + 10
8
2 3 episternum II
coxa II epimeron II
proboscis (galeae)
vertex
6 7
postspiracular bar
coxa III
prespiracular bar episternum III
400 µm antenna
5
4
epimeron III
(a)
5
maxillary palpus
ocellus
frontoclypeal sclerite
2 mm galea
stipes
compound eye frons mandible clypeus labrum
maxilla lacinia cardo maxillary palpus
(b)
labial sclerite
(c)
socket of labial palpus
accessory gland
testes (fused)
tegumen
seminal vesicle
200 µm
gnathos ejaculatory duct
uncus rectum
(d)
(e)
vas deferens aedeagus (f) vinculum
valva
saccus juxta
Fig. 1. Adult Lepidoptera anatomy. (a) Lateral view of Danaus plexippus (Nymphalidae) with hairs, scales, wings, and midand hindlegs removed. (b) Front view of head of Epimartyria (Micropterygidae). (c) Ventral view of head of D. plexippus. (d) Unspecialized scales. (e) Androconia (specialized male scales). (f) Male genital system.
in males by a large spine formed from several fused setae that projects forward from the base of the hindwing and is held by a clasp on the forewing. The spine or group of setae is known as the frenulum and the clasp as the retinaculum (Fig. 4a). In homoneurous moths (the shape and venation of the fore- and hindwings are similar; see below), there is a lobe, the jugum, at the base of the forewing, which engages with the hindwing or with the frenulum when it is present (Fig. 4b). In butterflies, skippers, and some moths, the humeral angle of the hindwing is expanded and strengthened by one or more humeral veins (Fig. 4c). In these groups, the frenulum is usually lost, the wings being united functionally by the overlapping lobe.
Genitalia. The external genitalia, together with the anus, usually occupy the last three segments of the abdomen. The form of the external genitalia, especially of the male (Fig. 1f ), has been used widely in the separation and classification of species. Thus, the valva, uncus, gnathos, and saccus often differ widely between groups and sometimes even closely related species. During copulation, the vulva (ostium bursae) of the female receives the phallus (aedeagus) of the male, through which the sperm is passed in a packet, the spermatophore. The female stores the spermatophore in a special chamber, the corpus bursae, and can then fertilize her eggs as they are laid, sometimes many days later. This is achieved in most Lepidoptera by the sperm passing along a seminal
Lepidoptera duct that connects the corpus bursae with the egg duct (oviduct), through which the eggs must pass before they can be laid. In some groups the oviduct terminates in a distinct egg-laying apparatus, the ovipositor.
wing muscles
esophagus brain
aorta
5 mm
colleterial gland
crop
Developmental Stages Development starts with the fertilized egg nucleus, which divides and grows, nourished by nutrients stored inside the egg, to give rise to the first larval stage (first instar). This has to bite its way out of the egg shell to commence its independent development. The larvae of Lepidoptera, commonly called caterpillars, are mandibulate and cylindrical, with short articulated thoracic legs and a variable number of fleshy abdominal prolegs. They have one pair of thoracic spiracles and eight pairs of abdominal spiracles (external breathing orifices). After the larva grows and sheds its skin several times, the skin of the fully grown, last larval instar is eventually shed to reveal the immobile pupa. This is the stage during which metamorphosis takes place, the transformation between the wormlike larval stage and winged adult. Inside the pupa, most of the old larval structures are broken down and used as materials and energy supplies to build the adult insect. The pupae of Lepidoptera are variable in form and are often enclosed in a silken cocoon. See INSECT PHYSIOLOGY; METAMORPHOSIS. Egg. There are two general types of eggs, flattened and upright. In both types the surface of the egg shell (the chorion) is usually sculptured, with the sculpturing on the upright type generally more complex (Fig. 5). There is a microscopic entrance in the chorion, the micropyle, through which sperm can enter. The eggs are usually laid on the food plant, either singly or in clusters, attached by means of a glue or cement. However, the eggs are sometimes merely scattered in the vicinity of the food plant. Larva. The caterpillars are variable in color, size, and shape; the presence of warts, hairs, and setae; the arrangement of hooks (or crochets) on the prolegs; and many other superficial details. They are rather constant, however, in basic morphology (Fig. 6). There are usually six pairs of ocelli, and compound eyes are absent. At most, eight pairs of prolegs are present, but almost every degree of reduction is found; most species have five pairs of prolegs. See CATERPILLAR. Silk is produced by the labial glands. They are long, coiled, simple tubes which unite anteriorly in a cylindrical spinneret, opening at the front of the labium. See SILK. Pupa. There are three main types, although the vast majority are obtect, with the appendages (legs, antennae, wings) completely sheathed with the rest of the body and immobile, and only the terminal abdominal segments movable. In the Hepialoidea, Cossoidea, and other lower moths, more of the abdominal segments are movable (Fig. 7). Although sheathed, the pupal appendages of the Hepialoidea and Cossoidea are partly free, and their pupae are referred to as incomplete. The
rectal sac
ovary
heart
sucking pump
bursa copulatrix
767
anus ovipore oviduct
ganglia salivary gland (a)
midintestine foreintestine
esophagus
ventral nerve cord heart
spermatheca Malpighian tube
hindintestine
brain
ventral nerve cord thoracic legs
spinneret (b)
silk prolegs ganglion gland 1 cm
testis
Fig. 2. Internal anatomy of Danaus plexippus (Nymphalidae) with tracheal (breathing tube) system and most of musculature omitted. (a) Adult female. (b) Larval male.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i) Fig. 3. Types of Lepidoptera antennae. (a) Fusiform. (b) Clubbed and hooked. (c, d) Clubbed. (e) Serrate and fasciculate, dorsal view. (f) Doubly bipectinate (quadripectinate). (g) Laminate, lateral view. (h) Bipectinate. (i) Simple and ciliate.
768
Lepidoptera jugum
costal margin Sc
retinaculum
R1
C
R4
R2 + 3 Rs M R 4 + 5 1+2 M M3 Cu
R5 M1 M2
2V + 3V basal angle
inner margin
1V
Cu2
M3 Cu1 outer margin
discal cell
humeral angle
frenulum costal margin M
Sc + R1
Rs
M1 + 2
Cu
2V base
Sc1 Sc R 2 1
H
R2 R 3
V1 1V
Cu2
Cu1
6 mm
(c)
(a)
Sc + R1
R5
Rs M
V
Cu
1 mm Sc + R1 Rs M1
Cu1 Cu2
3V
2V
R4
R+M
R5 M1 M2 M3
Sc
M2 M3
H
R2 R3 R4
R2 + 3
M
Sc R1 R2 R 3 R4 R5 M1 M2 V 2 + 3V Cu2 M3 1V Cu1 2 mm (b)
R1 M1
+3
1V Cu2
Cu 2V
frenulum
Rs
M3
3V2
apex
2 + 3V
Sc R1
R2 R3 R4 R5 M1 M3 M2 Cu1
Cu2 2V
M2 M3 Cu1
5 cm
Fig. 4. Wing venation patterns with the veins labeled. (a) Male Acrolophus popeanellus (Tineidae). (b) Epimartyria (Micropterygidae) and Nepticula nyssaefoliella (Nepticulidae). (c) Danaus plexippus (Nymphalidae). C = costa; Sc = subcosta; R = radius; Rs = radial sector; M = media; Cu = cubitus; 1V, 2V, 3V = vannal or anal veins; H = humeral vein; subscripts refer to branches (for example, R2 is second branch of radius).
most primitive moths, such as the Micropterygoidea and Eriocranioidea, have free (exarate) pupae in which the appendages of the head and thorax are not sheathed, the segments of the abdomen are movable, and the outer covering (integument) is soft. These pupae are also decticous, having functional mandibles. All Lepidoptera with obtect pupae, including those that are incomplete, are adecticous, that is, lacking functional mandibles.
500 µm
1 mm (a)
(b)
Fig. 5. Butterfly eggs. (a) Satyrium acadica (Lycaenidae). (b) Anthocharis midea (Pieridae).
Classification The Lepidoptera were long divided into two suborders, the Rhopalocera, with clubbed antennae, notably the butterflies and skippers, and the Heterocera, with antennae of other forms, as in most moths. The butterflies and skippers form a major natural unit with over 15,000 species, still often referred to as the Rhopalocera. They are, however, closely related to various higher moths, and the simple division of Lepidoptera into Rhopalocera (butterflies) and Heterocera (moths) is not tenable. The divisions Macrolepidoptera and Microlepidoptera often appear in older literature. The microlepidoptera comprise the more ancient groups, the most primitive of which have adults with chewing mouthparts and two pairs of similar wings (homoneurous moths), and include all of the families from Micropterygidae to Thyrididae listed in Table 1. Generally these are small insects, some with wingspans as small as 3 mm (0.1 in.). The macrolepidoptera include most of the larger moths and the butterflies, with wingspans typically in the 20–80-mm (0.8–3.2-in.) range, with a few reaching as much as 280 mm (11 in.). However, a few microlepidoptera, such as certain cossids and hepialids [wingspans up to 200 mm (7.9 in.) and more], are much larger than many macrolepidoptera, which can have wingspans as small as 8 mm (0.3 in.). In terms of common descent, the Macrolepidoptera do seem to form a natural group, but the microleps definitely do not, and the term Microlepidoptera is not used now, except
Lepidoptera informally (for example, micromoth). Recent comparative research on Lepidoptera indicates that the major groups evolved in a sequential stepwise fashion. This makes natural division into just two or three large groups impossible. The very oldest moths, those with functional adult mandibles, are placed in suborder Zeugloptera, with no more
girdle antenna mandible forewing
larva 2 mm (a) 500 µm
proleg
25 µm
(a)
5 cm (b) osmeterium
(b)
15 mm (c)
Fig. 7. Pupae. (a) Dyseriocrania auricyanea (Eriocraniidae). (b) Sthenopis thule (Hepialidae). (c) Papilio troilus (Papilionidae).
antenna
50 µm
15 mm
3 cm
2 cm
1 mm (c) Fig. 6. Larvae. (a) Newly hatched larva of Micropterix calthella (Micropterygidae). (b) Hickory horned devil, larva of Citheronia regalis (Saturniidae). (c) Skipper and butterfly larvae: mature Thorybes pylades (Hesperiidae), showing neck; mature Papilio cresphontes (Papilionidae), osmeterium extruded; first instar Satyrium liparops (Lycaenidae), about 2 mm (0.08 in.) long.
than about 200 known species. Two small and very obscure groups are considered to be the next oldest known Lepidoptera, the Agathiphagidae and the Heterobathmiidae. All three of these ancient groups lack sucking mouthparts. A major innovation was evolution of the proboscis, giving rise to the socalled Glossata suborder. The living moths that represent the earliest stage of this development are the Eriocraniidae. A further novelty was development of hollow scales, a feature also shared by the great majority of living Lepidoptera. Some of these Coelolepida then evolved muscles that coil the proboscis, and the obtect type of pupa. All Lepidoptera with coilable proboscides, obtect pupae, and larvae with crotchet-bearing abdominal prolegs belong to the Neolepidoptera. When this stage was reached, these Lepidoptera evidently still had two very similar pairs of wings (Homoneura), as seen in ghost moths (Hepialidae). A further step was the evolution of a marked difference between the fore- and hindwings (Heteroneura). A subgroup of these moths then underwent a major change in the female reproductive system, developing separate orifices for copulation (vulva) and egg laying (oviduct) that are connected by a seminal duct (Ditrysia). By this stage it would seem that the Lepidoptera were increasingly successful, evolving step by step many new families. The Apoditrysia evolved with a modification of the thorax; then fusion of the first four abdominal segments of the obtect pupa led to the Obtectomera (although validity of the group based on this character is open to question, as this condition also occurs in some lower Ditrysia, including the Yponomeutoidea, Gelechioidea, and Alucitoidea). Finally, evolution of the Macrolepidoptera occurred, in which all of the many species share a special type of wing base. The familiar butterflies are subordinate members of the Macrolepidoptera, nested inside this huge and highly diverse “crown” group. It is evident that Lepidoptera classification cannot simply reflect a set of major alternatives, such as butterflies versus moths, or Macrolepidoptera versus Microlepidoptera, but needs to mirror the successive stepwise evolution of one group after another. Thus, not only are the butterflies Macrolepidoptera,
769
770
Lepidoptera
TABLE 1. Size, distribution, and common names of some families of Lepidoptera Classification
Common name
Distribution
Number of species∗
ZEUGLOPTERA Micropterygidae
Micropterygids
Cosmopolitan
OTHER NON-GLOSSATA Agathiphagidae Heterobathmiidae
Agathiphagids Heterobathmiids
Queensland, eastern Pacific Southern South America
GLOSSATA Eriocraniidae
Eriocraniids
Holarctic
5 (24)
COELOLEPIDA Acanthopteroctetidae Lophocoronidae Neopseustidae
Lophocoronids Neopseustids
North America, Peru, Crimea Australia India to Taiwan, southern South America
4 (6) (6) (10)
NEOLEPIDOPTERA Mnesarchaeidae Hepialidae
Mnesarchaeids Swift or ghost moths
New Zealand Cosmopolitan
(14) 18 (500)
HETERONEURA Nepticulidae Incurvariidae Prodoxidae
Serpentine leaf miners Incurvariids Yucca moths and relatives
Cosmopolitan Mainly Palearctic and Australia North America (one species in southern South America)
75 (800) 3 (100) 75
DITRYSIA Tineidae Psychidae Gracillariidae Yponomeutidae Heliodinidae Oecophoridae Coleophoridae Gelechiidae Cosmopterigidae
Clothes moths and relatives Bagworms Gracillariids Ermine moths Heliodinids Oecophorids Case bearers Gelechiids Cosmopterigids
Cosmopolitan Cosmopolitan Cosmopolitan Cosmopolitan Cosmopolitan Cosmopolitan but largely Australian Cosmopolitan Cosmopolitan Cosmopolitan
135 (3000) 25 (1000) 235 (2000) 65 (600) 21 (60) 225 (3200) 110 (1400) 590 (4600) (1650)
APODITRYSIA Megalopygidae Limacodidae Zygaenidae Cossidae Sesiidae Castniidae Tortricidae Alucitidae Pterophoridae
Flannel moths Slug moths Foresters and burnets Goat or carpenter moths Clearwing moths Castniids Tortricids Many-plume moths Plume moths
New World, mainly tropical Cosmopolitan Worldwide except New Zealand Cosmopolitan Cosmopolitan Neotropical and Indo-Australian Cosmopolitan Cosmopolitan Cosmopolitan
11 (250) 50 (1000) 20 (1000+) 45 (670) 120 (1200) (∼150) 925 (5000+) 1 (135) 130 (1000)
OBTECTOMERA Pyralidae Crambidae Thyrididae
Pyralids Snout moths Window-winged moths
Cosmopolitan Cosmopolitan Tropical
500 (6,000+) 700 (11,500+) 10 (760)
MACROLEPIDOPTERA Lasiocampidae
Tent caterpillars, lappet moths
Cosmopolitan except New Zealand; mainly tropical Tropical Cosmopolitan Cosmopolitan Cosmopolitan except New Zealand and Pacific Cosmopolitan Largely African and Indo-Malayan, but with important Holarctic species Cosmopolitan Holarctic Cosmopolitan
Bombycidae Saturniidae Sphingidae Notodontidae Noctuidae Lymantriidae
Silkworm and allies Giant silkworms Sphinx, hawk, or hummingbird moths Prominents, puss moths Noctuids, owlets, underwings, millers Tussock moths
Arctiidae Drepanidae Geometridae
Tiger moths Hooktips Measuring worms, loopers, cankerworms, carpets, waves, and pugs Uraniids Hedylids
Uraniidae Hedylidae RHOPALOCERA Hesperiidae Papilionidae Pieridae Lycaenidae Riodinidae Nymphalidae
Skippers, agave worms Swallowtails, bird-wings, parnassians Whites, sulfurs, orange-tips Blues, coppers, hairstreaks Metal marks Brush-footed butterflies
Tropical Neotropical
Cosmopolitan Cosmopolitan Cosmopolitan Cosmopolitan Mainly Neotropical, but all major regions Cosmopolitan
3 (200)
(2) (10)
30 (1500) 1 (introduced) (350) 60 (1480) 106 (1250) 120 (2800+) 2700 (35,000+) 27 (2500+) 200 (∼6000) 6 (650) 1200 (21,000) (700) (40)
240 (3500) 27 (600) 61 (1000) 138 (5000) 1 (1250) 211 (6000)
∗The first figure is the number of described species in North America north of Mexico. In most cases this figure is reasonably accurate. The figure in parentheses is a good estimate of the number of described species in the world. However, it is difficult to derive the total number of living Lepidoptera species from these figures. While in groups such as the Papilionidae more than 90% of existing species have almost certainly been recognized, in others, such as many families of Microlepidoptera, the figure may be well under 25%. Classification is based largely on N. P. Kristensen (ed.), 1999.
Lepidoptera but at successively higher levels they are also members of the Obtectomera, Apoditrysia, Ditrysia, Heteroneura, Neolepidoptera, Coelolepida, and Glossata (Table 1). Butterflies are in reality just one of several advanced group of moths. Of all the major formed groups indicated in Table 1, the only group that the butterflies do not belong to is the suborder Zeugloptera, and the same is true for all the other groups listed, insofar as they are included within all the major divisions above their position in Table 1. Some of the most important Lepidoptera families are also listed in this table. Homoneurous moths. In Table 1, the first nine families listed, up to the Heteroneura, have a primitive homoneurous wing condition, with the fore- and hindwings similar in shape and venation. They do not form a natural group, but it is convenient to deal with them together. Their fore- and hindwings are connected by a jugum and sometimes also by a frenulum. Mouthparts either are mandibulate, with the mandibles functional, vestigial, or absent (Fig. 8), or consist of a rudimentary proboscis. The females have a single genital opening, except in the Mnesarchaeoidea and Hepialoidea, which have two openings, one for copulation and the second for egg laying, but they lack the connecting seminal duct found in the Ditrysia. The pupae are exarate (free) or incomplete. These primitive moths, making up less than 1% of Lepidoptera species, share a number of characteristics with the Trichoptera (caddis flies), to which the Lepidoptera are very closely related. In addition to the three groups described below, six other families of inconspicuous homoneurous moths are listed in Table 1, several of which were discovered only in the 1970s.
Superfamily Micropterygoidea. These moths make up one family, the Micropterygidae. They are small moths [up to 15 mm (0.6 in.) wingspan but usually much less], and the adults have asymmetrically toothed, functional mandibles and lack even a rudimentary proboscis. The galea is short and the adults feed on pollen or fern spores. The larvae, which feed variously on mosses, liverworts, angiosperm leaves, and possibly even fungal hyphae, mostly lack abdominal appendages, although some are very unusual in having eight pairs of simple abdominal prolegs. It has been suggested that these moths are terrestrial Trichoptera or that they should be placed in a separate order, the Zeugloptera. However, characters of the wing venation, the broad well-developed scales with numerous ridges, and many other features indicate that these insects do belong to the Lepidoptera. Superfamily Eriocranioidea. In this group of small moths [never more than 15 mm (0.6 in.) wingspan], the adult mandibles are greatly reduced and untoothed, and the galeae form a simple and abbreviated proboscis that cannot be coiled. These moths appear to represent the earliest stage in the evolution of the Glossata. Formerly, three or even more families were placed in this group, but the Neopseustidae, Mnesarchaeidae, and Acanthopteroctetidae are now considered to represent separate, further stages in glossatan evolution (Table 1). The single remaining family, the Eriocraniidae, have legless leaf-mining larvae. The adults fly in sunshine and drink sap from injured leaf buds. The females have a cutting (piercing) ovipositor used to insert their eggs into leaf buds. Superfamily Hepialoidea. The hepialoids are mediumsized to very large moths possessing short, not
200 µm
labial palpus
galea
palpus
maxilliary palpus
galea
(a) 400 µm
cardo stipes
palpus (b)
stipes galea
mentum 400 µm
400 µm
lacinia (c)
(d)
Fig. 8. Mouthparts of adult homoneurous moths. (a) Mandibles of Sabatinca incongruella (Micropterygidae). (b) Maxilla of Micropteryx aruncella (Micropterygidae). (c) Maxilla of Eriocrania semipurpurella (Eriocraniidae). (d) Maxillae and labium of Mnesarchaea paracosma (Mnesarchaeidae).
771
772
Lepidoptera fully coilable proboscides that are probably nonfunctional. Together with the Mnesarchaeidae (Mnesarchaeoidea), they form the Exoporia, a group representing the first stage in the evolution of the Neolepidoptera. Hepialoid larvae are borers in stems or roots, either feeding in the tunnels they create or emerging to eat vegetation; some feed on decaying wood and fungi. The adults are rapid flyers and mostly crepuscular (active during twilight or preceding dawn), giving the common name swift moths or ghost moths. There are five families, the best known of which is the Hepialidae with about 500 known species. Some have wingspans approaching 250 mm (9.8 in.). Heteroneura. Fore- and hindwings in the Heteroneura are markedly different in shape and venation. Usually they are connected by a frenulum and retinaculum. The sucking mouthparts are occasionally vestigial, but the majority of heteroneurans are able to feed on nectar, juices from rotting fruits, and other liquids. The females of all except the Nepticuloidea and Incurvarioidea have two genital openings. The pupae are obtect. The Heteroneura appear to be a major natural group that comprise all the rest of the Lepidoptera, including the Macrolepidoptera. This section deals with the heteroneurous microlepidoptera, leaving the macrolepidoptera to the third section. Superfamily Nepticuloidea. Two very closely related families are included, Nepticulidae and Opostegidae. These tiny moths have wing spines and females have a single genital opening, but they differ from the Incurvarioidea in having reduced venation, a large eye-cap at the base of the antenna, and a short nonpiercing ovipositor. The larvae, with the exception of some gall-making species (insect galls are enlargements or swellings of plant tissue due to the feeding stimulus of the insects) of the genus Ectoedemia, are miners in leaves, bark, and rarely fruits. The pupae have the leg bases free. Many species of the genus Nepticula have a wing expanse in the 3–5-mm (0.1–0.2-in.) range, being the smallest insects in the order. Superfamily Incurvarioidea. Six families are included in this group, with all members having an extensible, piercing ovipositor in the female, and a portable case in the last larval instar. As in the Nepticuloidea and homoneurous moths, the females have a single genital opening. The Incurvariidae have wings covered with microscopic spines, and almost complete venation. The larvae are seed, leaf, stem, or needle miners. In the subfamily Incurvariinae, the larva is first a miner and then a case bearer. With up to 300 species, the fairy moths (Adelidae) are the largest incurvarioid family. Male adelids have spectacularly long antennae, and some species make swarming flights in bright sunshine; the larvae feed on dead leaves and low plants. The yucca moth, Tegeticula yuccasella, belongs to the family Prodoxidae, almost all members of which are North American. This small white moth has an obligatory relationship with yuccas. The females gather pollen with their specialized mouth-
2 cm
1 cm (a)
(b)
Fig. 9. Thyridopteryx ephemeraeformis (Psychidae). (a) Male. (b) Male case.
parts and fertilize the yucca flower. The eggs are laid, using the piercing ovipositor, in the plant ovary, where the larvae grow by eating some of the developing seeds. Superfamily Tineoidea. The Tineoidea formerly included a large and heterogeneous assemblage of moths, but only four tineoid families are now accepted, with the others groups variously reassigned. The tineoids can be regarded as the oldest or most primitive members of the Ditrysia. The Psychidae comprise the remarkable bagworms. The males are hairy, strong-bodied, swiftflying moths with reduced mouthparts. They are relatively large, some with a wing expanse of about 25 mm (1 in.). The females are wingless, legless, and often sluglike. They live concealed in bag-shaped cases made by the caterpillars. The best-known North American representative is Thyridopteryx ephemeraeformis (Fig. 9). In this species, the larva fastens the bag to a twig and pupates within it. The vermiform (wormlike) female emerges from the pupa and moves to the bottom of the bag, where she is fertilized. This is accomplished from the outside by the highly specialized extrusible genitalia of the male. The female deposits her eggs in the bag, then drops to the ground and dies. The Tineidae are a family of small, mostly drab moths that often have an erect tuft of yellowish scales between their antennae. The proboscis is reduced or vestigial, but the wing venation is fully developed and only rarely somewhat reduced; the ovipositor is generally long and extensible. The larvae do not feed on green plants but on a variety of substrates, including detritus, fungi, lichens, and notably keratin (wool, skin, etc.). The larvae of many species are case bearers. The best-known species are the clothes moths, which include the case-making clothes moth (Tinea pellionella), the webbing clothes moth (Tineola bisselliella), and the carpet moth (Trichophaga tapetzella); all are important pests whose larvae devour wool and other animal products. These three species have wing expanses of less than 25 mm (1 in.). Superfamily Gracillarioidea. This group includes four ditrysian families that share a number of specialized
Lepidoptera features. The Gracillariidae, by far the largest of the four in terms of included species, have lanceolate (shaped like the head of a lance) and widely fringed pairs of wings, well-developed proboscides, and nonextensible ovipositors; and they comprise the principal group of leaf-mining Lepidoptera. The young larvae are flattened and have bladelike mandibles with which they slash the cells of the leaf, sucking up the exuding juices. When full-grown, the larvae of some species are quite different, being normal in appearance and feeding on leaf parenchyma, either in a mine or externally. In Phyllocnistis the final instar does not feed, lacks legs and mouthparts (other than the spinneret), and pupates within its mine (a gallery it creates within the leaf tissue). Superfamily Yponomeutoidea. This is an assemblage of rather primitive micromoths, with eight families somewhat uncertainly included. The Yponomeutidae represent a heterogeneous assortment of about 600 small, often brightly colored moths that lack ocelli. Most of the 80 or so species of Yponomeuta (ermine moths) have larvae that live in communal webs sometimes covering whole trees or shrubs. The Heliodinidae are a small family of tiny moths with ocelli and smooth heads that often have brilliant metallic spots on the forewings. The larvae are leaf miners, borers in stems, buds, or galls, or external leaf feeders; sometimes in communal webs. Superfamily Gelechioidea. This is a very large group of some 15 families, representing a combined total of over 16,000 described species of micromoths, with wingspans in the range 4–70 mm (0.16–2.7 in.). The superfamily is represented worldwide, but certain families or subfamilies are mainly associated with particular zoogeographical regions. It is anticipated that many thousands more species of these moths await discovery. The group is characterized by the overlapping scales found on the dorsal surface of the proboscis, although this feature does recur in some Ditrysia. See ZOOGEOGRAPHY. Oecophoridae are small to moderately small moths, some of which have a comb of bristles, the pecten, on the antennal scape (shaft). The larvae are varied in habits: some feed in webs or rolled leaves, whereas others are scavengers, notably in Australia where certain species specialize on the tough fallen leaves of Eucalyptus. Stathmopoda larvae feed on coccids, and some related species feed on spider eggs. Hofmannophila pseudospretella is an important pest of stored products, including grain, dried fruits, and furs. Coleophoridae are small, narrow-winged moths. The young larvae of most species are leaf miners, but from the second instar they become case bearers, carrying shells made out of silk and bits of leaves, to which they add as they grow and within which they eventually pupate. Other coleophorids feed in seed heads, mine pine needles, or are inquilines in galls. Holcocera pulverea (subfamily Blastobasinae) is an important predator of the lac insect in India. The Gelechiidae are a very large family of small to minute moths, usually with rounded forewings
or, rarely, pointed ones. The hindwings are trapezoidal and often pointed, with the outer margin excavated or sinuate below the apex. The larvae include seed feeders, miners, borers, gall makers, and foliage feeders, and are known to attack over 80 different plant families. The group includes a number of economically important species, such as the Angoumois grain moth (Sitotroga cerealella), which infests grain both in the field and in storage, and the pink bollworm (Pectinophora gossypiella), an extremely important pest of cotton worldwide. The Cosmopterigidae, with over 1600 species, are the closest relatives of the Gelechiidae. While most are plant or detritus feeders, the larvae of Euclemensia are parasitoids, feeding internally in armored scale insects. Superfamily Zygaenoidea. Currently classified as the first group of the Apoditrysia, the zygaenoids comprise a dozen families apparently linked by having a retractile larval head, and the second abdominal spiracle of the pupa covered by the wings. These moderately small to sometimes quite large moths have complete venation and in some cases a rudimentary proboscis; the wings are usually broad with short fringes. The larvae are stout and sluglike, and are exposed feeders. The Himantopteridae are a curious group of about 40 species restricted to the Old World tropics. The hindwings of these small or middle-sized moths are very narrow with long ribbonlike tails, and the body and wings are covered with long hairs. The larvae of some species feed on trees (dipterocarps and oaks), but it has been suggested that others live in termite colonies. Supposedly the newly emerged moths escape when attacked by the termites, sacrificing their long hairs that easily pull free and their expendable tails. Recent work has failed to substantiate this, but the potentially fascinating biology of these moths is poorly known. The Megalopygidae are restricted to the New World and are mainly tropical. The head and thorax are densely scaled, and the abdomen is very hairy. The larvae of some species are pests of palms, ferns, and various other plants, and their penetrating setae (they are often covered by long hairs) can cause dermatitis in humans. The family is considered to be related to the Limacodidae. The Limacodidae are a family of heavy-bodied, often brightly colored, hairy moths (Fig. 10). The
1 cm Fig. 10. Prolimacodes badia (Limacodidae).
773
774
Lepidoptera larvae are short and especially sluglike (slug caterpillars), with a large head concealed by the thorax, and furnished with various spines, tubercles, and even gelatinous warts. The thoracic legs are small or minute, and the abdominal prolegs are replaced by midventral suckers that are used, together with a fluid secreted by the spinneret, to stick the caterpillars tightly to foliage. The best-known North American form is the saddleback caterpillar (Sabine stimulea) which, like many limacodid larvae, has irritating (urticating) hairs. The Zygaenidae are medium-sized, often brightly colored, mainly day-flying moths found in most regions (but not New Zealand), with over 1000 known species. They appear to be chemically protected, and if attacked many produce cyanide, a poison to which they appear to be resistant. Many zygaenids are involved in mimetic associations, notably the Oriental subfamily Chalcosiinae, some of which look very similar to aposematic (pertaining to colors of an organism that give warning of its special means of defense against enemies) butterflies such as Danainae. The larvae are rather stout and feed openly on plants. In some groups, notably the large genus Zygaena, the pupa is formed within a characteristic parchmentlike cocoon spun on a grass stem or similar substrate. See PROTECTIVE COLORATION. Superfamily Cossoidea. Two families are included. Because the larvae bore tunnels in wood and smell rather strongly, the Cossidae are commonly called carpenter moths or goat moths. They are heavybodied insects, sometimes very large [maximum 235-mm (9.2 in.) wingspan], with the abdomen extending well beyond the hindwings. The mouthparts are largely rudimentary. The median vein (M, Fig. 4) is present and usually forked within the discal cell (a large cell in the central, or disc, part of the wing) of both wings. Although the larvae of some species feed externally on stems or roots, most are tunnelers, even in hardwood trunks. Prionoxystus robiniae, the best-known American cossid, is very destructive to a large variety of deciduous trees, and Zeuzera coffeae damages numerous species, including coffee. Like the Cossidae, the Dudgeonidae, a group of six stem borers found in the Old World tropics characterized by an abdominal hearing organ, also have a very strongly reduced proboscis. Superfamily Sesioidea. This group of three families is thought to be very closely related to the Cossoidea. The Sesiidae are called clearwing moths because of the large, transparent, scaleless areas on the wings. Veins Sc and Rs on the hindwing appear to be absent, but are actually concealed in a unique costal fold that acts as an additional wing-coupling mechanism. Diurnal, with contrasting black and transparent patterns, or more brightly colored, many of these moths are excellent wasp mimics. The larvae are mostly borers, and many are economic pests, including the currant borer (Synanthedon tipuliformis), peachtree borer (Sanninoidea exitiosa), and squash-vine borer (Melittia satyriniformis). The South American Synanthedon coccidivora is carnivorous on coccids. The Castniidae are large [wingspan range of 24–
190 mm (0.9–7.5 in.)], diurnal, butterflylike moths with broad wings, clubbed antennae, and fusiform eggs. The larvae bore into monocots, and some are minor pests. A proboscis may be present or (rarely) vestigial. These moths have been considered to be related to the butterflies, but the resemblances must be due to convergence, and some are mimetic of butterflies and other moths. They are found in the Neotropical and Indo-Australian regions. Superfamily Tortricoidea. These small, broad-winged moths comprise just a single very large family (over 5000 species), the Tortricidae, currently divided into three subfamilies. The best character for the group as a whole appears to be the large, flat ovipositor lobes of the females. The hindwings of the moths belonging to subfamily Olethreutinae usually have a fringe of long hairs on the upper side, running along the basal part of the cubitus. The larvae are generally hidden feeders and live in rolled leaves, in foliage webbed together, or inside fruits. The subfamily contains a number of agriculturally very undesirable species, notably the codling moth (Cydia pomonella), a serious pest of apples and other fruits. The genus Cydia (formerly called Laspeyresia) also contains the curious Mexican-jumping-bean moth, C. saltitans. The violent movements of the larvae feeding inside are responsible for the jumping action of the beans. Moths placed in the subfamily Tortricinae generally lack the long cubital fringe on the hindwing characteristic of the Olethreutinae. The spruce budworm (Choristoneura fumiferana), which belongs to the large tortricine tribe Archipini, is probably the most injurious tortricid. In many places, especially eastern Canada, the larvae of this moth have defoliated vast areas of coniferous forest. Members of the third and smallest subfamily, the Chlidanotinae, are largely tropical and relatively poorly known. Superfamily Alucitoidea. Two small families of rather delicate micromoths are included. The pupae have immovable abdominal segments I–IV. The Alucitidae are the larger of the two families, with larvae that bore into buds, shoots, flowers, and fruits or make galls; the African Alucita coffeina is a pest of coffee. The adults of most species have the forewings divided into six featherlike plumes, and the hindwings into six or seven. Although this superfamily is now believed to be very closely related to the plume moths (Pterophoridae), the striking similarity of wing division appears to be a case of convergence. Superfamily Pterophoroidea. The Pterophoridae are known as the plume moths. The wings of most species are divided into featherlike plumes, of which there are usually two in the forewing and three in the hindwing (Fig. 11), similar to the Alucitidae. Although the alucitoids are now thought to be closely related, the evidence is ambiguous; the Pterophoroidea may be closer to the Pyraloidea. The moths lack ocelli and have characteristic slender bodies and long legs, with the wings at rest held outstretched; segments I–IV of the pupae are movable. The larvae feed exposed or are borers. Buckleria is remarkable for feeding on insectivorous sundews
Lepidoptera
1 cm Fig. 11. Platyptilia carduidactyla (Pterophoridae).
(Droseraceae); the North American Platyptilla carduidactyla (Fig. 11) is a minor pest of artichokes. Superfamily Pyraloidea. The Pyraloidea comprise one of the largest family-level divisions of the Lepidoptera, with over 16,000 named species, and the expectation is that this will rise to 30,000 or more when the taxonomy is worked out. Currently the group is divided into two families: the Pyralidae with 5 subfamilies and Crambidae with 15. All are linked by possession of a unique type of hearing organ located on the first abdominal segment. They are small to quite large moths, with a wing expanse in the range of 10–150 mm (0.4–6 in.), usually about 20–35 mm (0.8–1.4 in.). The hindwings often have three anal veins, while the legs are usually long and slender. The Pyralidae differ from the Crambidae in having the case of the hearing organ almost completely closed. The relatively small subfamily Galleriinae includes the bee moth, or wax worm (Galleria mellonella), which lives in beehives. The larvae feed on the wax at night and destroy the combs; the species occurs throughout the range of the honeybee. Subfamily Pyralinae reaches its richest development in the Old World tropics. The meal moth (Pyralis farinalis) is a cosmopolitan pest of stored products. The Phycitinae comprise a large group of moths (almost 4000 species) in which the frenulum of the female, like that of the male, is a simple spine rather than a bundle of bristles. The larvae have very diverse habits, being leaf rollers, case bearers, borers, and stored-products pests, and some (genus Laetilia) are predacious on coccids. The Indian meal moth (Plodia interpunctella) is cosmopolitan, feeding on a wide variety of stored products, especially cereals. Another extremely important pest is the Mediterranean flour moth (Anagasta kuehniella), which infests cereals throughout the world. In contrast to these harmful species, the subfamily Phycitinae also contains Cactoblastis cactorum, imported from South America into Australia, Hawaii, South Africa, and other areas to help control introduced Opuntia cactus, which had ruined millions of acres of pasture. This successful program is an outstanding example of biological control. See ENTOMOLOGY, ECONOMIC; INSECT CONTROL, BIOLOGICAL.
Crambidae are separated from the Pyralidae by the wide aperture to the hearing organ case and differences in the male genitalia. Subfamily Crambiinae (snout moths) contains small insects common in marshes and grasslands. In this group the labial palpi (Fig. 1a) are long and porrect (extended forward), giving the adults a beaked appearance. Species of the small subfamily Schoenobiinae bore in the stems of marsh-living grasses, with Scirpophaga and Rupela being important rice pests. Members of the subfamily Nymphulinae are notable for being almost entirely aquatic, and in some species the larvae are able to swim from plant to plant. The larvae of some nymphulines have tracheal gills, and the pupae are usually enclosed in a cocoon below the surface, from which the adults emerge through the water. In the genus Acentria, most female adults are wingless and reproduce parthenogenetically, never emerging from the water; males and winged sexual females emerge periodically, swarm, mate, and disperse. With about 7400 species, the Pyraustinae are the largest group of pyraloids and include some relatively large moths with wingspans of 30 mm (1.2 in.) or more. This subfamily includes the infamous European corn borer (Ostrinia nubilalis), the grape leaf folder (Desmia funeralis), and numerous other economically damaging species. Superfamily Thyridoidea. This mainly tropical group comprises a single family, the Thyrididae, with about 760 described species and many undescribed. They have often been included in the Pyraloidea, but they lack abdominal hearing organs and have a peculiar resting posture in which the body is raised up steeply and the wings held outstretched. Larvae of the Oriental genus Glanycus can stridulate (make audible sounds, produced by rubbing various body parts together), possibly a warning signal as these brightly colored moths appear to be chemically protected. Macrolepidoptera. All of the remaining groups dealt with below are macroleps. Unlike the microlepidoptera, there is some morphological evidence that the macros form a natural group, and they are now formally recognized as such. The Macrolepidoptera are the crown group of the order and include some 100,000 different sorts—more than half of all known Lepidoptera species. The largest moths and butterflies are included, and even the smallest are larger than nepticulids and other micros. Because of their size, the larvae of most macros are external feeders, but there are exceptions. Superfamily Lasiocampoidea. Two families, somewhat uncertainly united but both apparently related to the Bombycoidea, are included in this group. The Anthelidae are a small family restricted to the Australian region. Male Lasiocampidae always have bipectinate antennae (Fig. 3h), right to the tip; the females are usually similar but with shorter rami. This widespread, largely tropical family contains the eggar and lappet moths, and the familiar tent caterpillars of the genus Malacosoma. In this genus the larvae live together in large silken nests. Those of M. americana are a common sight on trees and shrubs in the eastern United States.
775
776
Lepidoptera Superfamily Bombycoidea. A group of nine families of often stout-bodied large moths, including some of the most spectacular insects in the whole order. These families appear to be reliably grouped together by a number of characters, including the anterior fusion of the forecoxae in the last instar larvae, and a peculiarity of the forewing radial venation. The family Eupterotidae represent a group of about 300 species, characterized by the last tarsal segment of the metathoracic leg of the female having a midventral row of spines. On this basis, the sexually dimorphic (having secondary morphological differences between the sexes) African genus Hibrilides, in which the females apparently mimic aposematic butterflies, has recently been included in this largely tropical family. See SEXUAL DIMORPHISM. The Bombycidae include the commercial silkworm (Bombyx mori). The larvae of this totally domesticated species feed on mulberry leaves, and resemble hawkmoths in that they possess a small caudal horn. The silk from the white or yellow cocoons has been used since prehistoric times. The caterpillars are subject to the ravages of a disease known as p´ebrine, caused by the protozoon Nosema bombycis. They are also attacked by two viral diseases known as grasserie and flacherie. The Bombycidae are recognized as a group based on characters of the thorax and pupa, and worldwide the family includes about 350 species. See MICROSPORIDEA. Saturniidae are the giant silkworms, medium-sized to extremely large insects having a single, often weak humeral vein in the hindwing. The mouthparts are reduced or vestigial, and the antennae are strongly bipectinate. As in the Lasiocampidae, the rami of the male antennae are conspicuously longer than those of the female. The frenulum is absent. The larvae usually bear spiny processes known as scoli. Pupation takes place in large silken cocoons, which are suspended from twigs or formed in leaf litter on the ground. Females of species Attacus atlas of the Himalayan foothills and Coscinoscera hercules of Australia have wingspans of up to 250 mm (9.8 in.) and possibly more, and these moths are thought to have the largest wing areas in the Insecta. A number of saturniids produce usable silk, which has long been gathered in the Orient. Among these are the Japanese oak silkworm (Antheraea yamamai) and the muga silkworm (A. assamensis). The nearly 1500 known species of Saturniidae, which are mostly tropical, are currently divided among nine subfamilies. In North America, the best-known species of the subfamily Saturniinae are the cecropia (Hyalophera cecropia), polyphemus (Antheraea polyphemus), luna (Actias luna), promethea (Callosamia promethea), and cynthia (Samia cynthia). Likewise in North America, the regal moth, or hickory horned-devil (Citheronia regalis), and the imperial moth (Eacles imperialis) are the most familiar members of the subfamily Ceratocampinae. In subfamily Hemileucinae, the buck moths and relatives, the io moth (Automeris io) is noteworthy because its common larvae, like those of some other members of the subfamily, are equipped
with urticating (poisonous) hairs that can cause considerable irritation to the unwary who handle them. The scoli of some tropical species release powerful anticoagulants that can induce bleeding from mucous membranes. Sphinx, hawk, or hummingbird moths constitute the family Sphingidae, a worldwide group of some 1250 species. Morphological peculiarities of the larvae, pupae, and adults suggest that the family is a natural group. Mainly medium-sized to very large, these heavy-bodied moths have extremely rapid flight. The adults are mostly crepuscular or nocturnal, but a few genera are diurnal, including the hummingbird hawks. They have rather characteristic antennae that are thickened and have a pointed apex, while the proboscis is well developed and often extremely long. The wings are narrow, with the hindwing much shorter than the forewing. A frenulum is present, rarely reduced to a tubular thickening. The larvae are external feeders and usually have a conspicuous caudal horn. The pupa is formed in a cell in the ground or in a loose cocoon at the surface, and in some species the long proboscis is housed in a projecting case resembling a jug handle. One of the most widely distributed species is the white-lined sphinx (Hyles lineata). Well known also is the Old World death’s-head sphinx (Acherontia atropos), the adults of which will enter beehives in search of honey. Of economic importance in the Western Hemisphere are the tomato hornworm (Manduca quinquemaculatus) and the tobacco hornworm (M. sexta), both of which are pests of solanaceous plants. Erinnyis ello is a serious pest of cassava, rubber, and ornamental euphorbs in South America, while Cephonodes hylas is a major pest of coffee in many parts of Africa and Asia. Superfamily Noctuoidea. This huge superfamily contains some 50,000–70,000 species, by far the largest family-level group in the Lepidoptera, and likely to remain so even if the Gelechioidea and Pyraloidea are ever fully worked out. Although rather uniform in basic structure, which has made their classification very difficult, the noctuoids are remarkably varied in size, coloration, and biology. As measured by wingspan, they include the smallest macromoths, down to about 8 mm (0.3 in.), and the largest: the South American Thysania agrippina sometimes attains wing spans of 280 mm (11 in.). Currently eight families, some with numerous subfamilies, are included. All members of this great assemblage are linked by possession of a special metathoracic hearing organ that is capable of perceiving ultrasound. This organ not only may be used to detect and escape from bats but may also function in courtship signaling. See PHONORECEPTION. Notodontidae are commonly called prominents, or puss moths. Together with two very small families (Oeonsandridae from Australia and the New World Doidae), they are separable from most other Noctuoidea by the apparently three-branched, or trifid, cubitus. The larvae are external feeders, and the pupa is formed in a cell in the ground or in a loose cocoon on the surface. The Nearctic Datana
Lepidoptera ministra, the yellow-necked caterpillar, and Schizura concinna, the red-humped caterpillar, are pests on apples and other trees. Noctuidae (once known as the Phalaenidae) are the owlet moths, an extremely large grouping of 35,000 or more mostly dull-colored, medium-sized moths. Evidence that this family, which is currently divided into 28 subfamilies, forms a natural group is unconvincing, and significant changes to the classification should be anticipated. As in most of the remaining Noctuoidea, the noctuid cubitus is usually quadrifid, but this is not a constant feature. As exemplified by the underwing genus Catocala, the forewings of the Noctuidae are almost always dully and cryptically colored. At rest, the forewings cover the hindwings, which may be strikingly colored. Most moths attracted to lights at night belong to this family. The larvae are mostly exposed foliage feeders, but a few such as Papaipema are borers. Some Eublemma species prey on scale insects, one being an important enemy of the commercially valuable lac insect. Pupation is usually in the ground. The family includes many agricultural pests: the cutworms, Euxoa and Peridroma, attack a large variety of plants, as does the army worm, Pseudaletia unipunctata. The former derive their name from the habit of cutting off shoots at the surface of the soil without consuming them; the latter derive their name from the fact that they often appear in vast numbers. The exceedingly important pest Helicoverpa armigera is variously known as the corn earworm, cotton bollworm, and tomato fruitworm. Lymantriidae are the tussock moths. Evidence that this group of medium-sized moths form a natural group is largely lacking, but the eversible middorsal gland on abdominal segment VI of the larva may be a reliable character. The antennae of the males are broadly pectinate. The end of the female abdomen often has a tuft of detachable hairs which are deposited on top of the eggs. The hairy larvae usually have prominent toothbrush tufts. The most familiar species is the infamous European gypsy moth (Lymantria dispar) which, along with the destructive brown-tail moth (Euproctis chrysorrhoea), was imported into New England in the latter half of the nineteenth century. Arctiidae are a relatively large family of often strikingly colored, frequently heavy-bodied insects, the tiger moths. The arctiids are characterized by a pair of eversible pheromone glands on the dorsum of the female abdomen, connected with the anal papillae. The larvae are often very hairy and feed either exposed or in webs. Larval hairs are incorporated into the silken cocoon. In North America, the bestknown member of the subfamily Arctiinae is probably the banded wooly bear caterpillar (Pyrrharctia isabella). There is a widespread misconception that the banding pattern of this larva predicts the severity of the coming winter. In Europe, the wooly bear of the spectacular garden tiger moth (Arctia caja) is equally well known. The larvae of the Lithosiinae are unusual in feeding on algae, lichens, liverworts,
or mosses. The third arctiid subfamily currently recognized, the Syntominae, includes many remarkable wasp mimics. Despite their great beauty and fascinating biology, the Arctiidae remain poorly known, and many discoveries can still be made. Superfamily Drepanoidea. This relatively small superfamily includes just two families, linked by the unusual form of the larval mandibles, which have a large flat lateral area marked off beneath by a distinct ridge. The group is thought to be closely related to the Geometroidea. Epicopeiidae are a small group of brightly colored day-flying species that appear to mimic a variety of other diurnal Lepidoptera; they are known only from the Palearctic and tropical Asia. The more familiar Drepanidae include about 650 species divisible into three subfamilies, of which the Drepaninae are the largest and include the hooktip moths, named for their forewing shape. The Drepanidae are characterized by an abdominal hearing organ located between tergum I and sternum II of a type not found in any other Lepidoptera, including the Geometroidea. Superfamily Geometroidea. Three families are included, the somewhat butterflylike Sematuridae (about 40 species, which lack abdominal hearing organs), the Uraniidae (about 700 species, also somewhat butterflylike but with abdominal hearing organs), and the Geometridae (a very large group of more than 20,000 species characterized by the unique form of the abdominal hearing organs). The three families are linked together by a character of the larval spinneret. The tropical Uraniidae include some slenderbodied, brilliantly colored, diurnal insects that lack a frenulum and are often mistaken for butterflies. The multiple-tailed metallic Madagascan sunset moth, Chrysiridia madagascariensis, is considered by many to be the most beautiful lepidopteran. It also offers a fascinating example of asymmetry: almost all Lepidoptera have highly symmetrical wing patterns, but this is not the case in Chrysiridia (and some other uraniids) in that the black spots on the hindwing orange area, and the black markings on the forewing green areas, invariably differ right to left. Uraniid larvae are diverse in form but have a complete set of prolegs. The Geometridae include the measuring worms, loopers, and cankerworms, which generally turn into small and medium-sized moths with slender bodies and relatively broad wings. The females are occasionally wingless (apterous). The larvae have reduced or absent anterior prolegs, and usually only those on segments VI and X are well developed. They proceed with a characteristic looping (earthmeasuring) motion. The larvae are usually exposed feeders, but sometimes live in folded leaves and often bear a striking resemblance to twigs. This resemblance is enhanced by their habit of resting with the body held rigid at an angle to the branch while holding on with the terminal prolegs. Pupation takes place in the ground or in a weak silken cocoon. This very large family is currently divided into eight subfamilies. Among the economically
777
778
Lepidoptera important Nearctic geometrids are the spring and fall cankerworms, Paleacrita vernata (subfamily Ennominae) and Alsophila pometaria (subfamily Alsophilinae), which attack a large variety of trees. The famous peppered moth (Biston betularia), so much studied regarding industrial melanism (below), is another member of the Ennominae. Also of interest is Idaea bonifata (subfamily Sterrhinae), the herbarium moth, whose larvae are pests in collections of dried plants. The beautiful emerald moths belong to the subfamily Geometrinae. Superfamily Hedyloidea. This is a small group of about 40 mothlike species with a curious resting posture, long placed in the Geometridae but recently considered to be the likely closest relatives of the Rhopalocera (skippers and true butterflies). Their early stages are remarkably similar in some ways to those of Papilionoidea; the adults stand on only four legs like Nymphalidae; and research on their DNA confirms a close relationship. The Hedylidae comprise just one genus, Macrosoma, found in Central and South America. Superfamily Hesperioidea. One rather large family, the Hesperiidae, is recognized, with about 3500 known species and probably many more unrecognized ones. The skippers, or skipper butterflies, are small to moderately large, heavy-bodied, mostly diurnal insects. The clubbed antennae are set wide apart on the head and are bent, curved, or reflexed at the tip. Forelegs are fully developed and bear a large spur (epiphysis). The forewings usually have all veins arising separately from the discal cell, and the frenulum is absent except in the male of Euschemon. Near the base of the hindwing, a small patch of specialized scales apparently occurs in all species. The larvae have a prominent constriction, or neck, behind the head, and often live in leaves drawn together by silk. However, the caterpillars of the giant skippers (Megathymus group), which are borers in yucca and agave in the Americas, lack this neck. The pupa is usually enclosed in a slight cocoon. The name skipper refers to the typically rapid erratic flight. American species include the silverspotted skipper (Epargyreus clarus) the sachem skipper (Atalopedes campestris). In Mexico, agave caterpillars are fried and canned for human consumption. The Australian Euschemon rafflesia was once placed in its own subfamily, largely because of the presence of a frenulum in the male, but it is now included in the Pyrginae, a group of some 1000 species and one of six subfamilies into which the skippers are currently divided. New molecular studies based on DNA are expected to bring many improvements to skipper classification. Superfamily Papilionoidea. These so-called true butterflies, together with the Hesperioidea, make up the Rhopalocera. The two groups share some specialized features in common, are usually regarded as each other’s closest relatives, and are together perhaps most closely related to the Hedyloidea. However, it is possible that the hedylids will eventually prove to be more closely related to one of the two Rhopalocera superfamilies. The Papilionoidea are small-to-large di-
urnal insects [a few are very large, as measured by wingspan, sometimes attaining 250 mm (9.8 in.)], typically with clubbed antennae that are rounded at the tip and not bent or reflexed. The forewings always have two or more stalked veins arising from the discal cell. The frenulum is always absent. The larvae have no constriction behind the head and are usually exposed feeders. The pupa is naked except in the apollos (Parnassius) and their relatives and some Lycaenidae, and is most often suspended headdown by caudal hooks (the cremaster) or held semiupright by a silken girdle. About 14,000 species are currently recognized worldwide, divided among five families. The Papilionidae are the family that includes the swallowtails and parnassians. These are the only true butterflies with a fully developed foreleg bearing an epiphysis. The hindwings have only one well-developed vannal vein, except in the anomalous Mexican Baronia brevicornis (subfamily Baroniinae), while the larvae always have a unique eversible forked organ on the prothorax, the osmeterium. This organ dispenses a disagreeable odor that seems to function as a defense mechanism. The pupa is typically girdled. In the boreal genus Parnassius (subfamily Parnassiinae) and some other members of the family, females may be found with a horny pouch, or sphragis, at the tip of the abdomen. This is secreted by the male during copulation and covers the genital opening of the impregnated female to prevent access by rival males. The commonest genera—Papilio, Graphium, Battus, and Parides (subfamily Papilioninae)—contain large and attractive species, many of which have the characteristic tails that give the family its name. The birdwing butterflies, Ornithoptera, Trogonoptera, and Troides (Papilioninae), are among the largest and most beautiful species. In North America, the eastern tiger swallowtail (Papilio glaucus) has dichromatic females, one form being black-and-yellow-striped like the male, and the other being entirely dark brown or black, apparently mimicking the protected Aristolochia swallowtail (Battus philenor). The larva of the orange dog (Papilio cresphontes) is sometimes injurious to citrus, and in the Old World there are a number of species that can be serious citrus pests, including the Oriental lime swallowtail (Papilio demoleus), recently introduced into the Caribbean. The Pieridae include the familiar whites, sulfurs, and orange tips. These butterflies appear to have unique types of abdominal base and antennae. The forelegs are completely developed in both sexes but lack an epiphysis. The tarsal claws are bifid, while many species exhibit striking sexual dimorphism. However, most are basically white, yellow, or orange. The larvae feed on plants of the families Brassicaceae (mustards and cabbages), Fabaceae (beans, peas, and other legumes), and Capparidaceae (capers). The pupae are suspended by a girdle, as in the Papilionidae. Butterflies of this family are often extremely abundant. The giant sulfurs (Phoebis) and others often
Lepidoptera migrate in huge swarms. The larvae of the Neotropical species Eucheira socialis are gregarious, living together in a nest; and in Mexico, where their larvae are harvested for food, there is a population of this butterfly that is semidomesticated. The cabbage worm or small white (Pieris rapae) is one of the most economically important butterflies, attacking cultivated crucifers in Europe and North America. The alfalfa butterfly (Colias eurytheme) is a major pest in America. The Lycaenidae include the blues, gossamers, hairstreaks, harvesters, and coppers. This large family, with about 5000 known species and many still to be recognized, is very closely related to the Riodinidae, and the latter have often been included within the lycaenids. All of these butterflies have a unique thorax in that the lamella of the mesodiscrimen is not complete to the furca. In male Lycaenidae the prothoracic legs, which always lack tarsal claws, are functional; but in Riodinidae they are reduced and not used for walking. In both families the eyes are emarginate at the antennal bases or at least contiguous with them. The larvae are mostly flattened, and the pupa is usually girdled. The larvae often have a mutualistic relationship with ants, which tend and protect them in return for honeydew which they secrete; however, a number are predacious on ant larvae or on plant bugs attended by ants. The ant organs of the Riodinidae differ in structure from those of the Lycaenidae, and recent molecular work confirms that the two groups are separate but are each other’s closest relatives. Lycaenidae are small to medium-sized insects that are often metallic blue or green on the upper surface. Species such as Brephidium exilis and Zizula hylax (subfamily Polyommatinae) have a wingspan of about 15 mm (0.6 in.) and are among the smallest butterflies, all of which are lycaenids. The largest lycaenid is the Indo-Australian Liphyra brassolis (subfamily Miletinae), with a wing expanse of about 75 mm (3 in.); its extraordinary armored caterpillars live in weaver ant nests and prey on their larvae. The harvesters (Miletus, Spalgis, Fenesica; Miletinae) have predacious larvae that feed on homopterans attended by ants, such as aphids, membracids, and coccids. In the United States, the gray or common hairstreak, Strymon melinus (subfamily Theclinae), has at times been an important pest of hops, cotton, and other crops. In various parts of the world, a number of lycaenids are significant pests, notably of beans and other legumes, often feeding on flowers or boring in pods. Species making up the relatively small group of copper butterflies (subfamily Lycaeninae) do not have associations with ants. Widespread in the Holarctic and parts of Asia and Africa, they have a remarkable distribution that includes endemic species in New Zealand, with the closest other members of the group in New Guinea. The Riodininae, or metal marks, are primarily Neotropical. They are small butterflies, often brilliantly colored. Like the Lycaenidae, many have associations with ants, but it is currently thought that the Riodininae have probably evolved independently.
The extraordinary gray and semidiaphanous Peruvian species Styx infernalis was formerly placed in its own family or subfamily, but recent discovery of its early stages confirms that it belongs to the Euselasiinae, a group of about 150 species. The 100 or so species that make up the Nemeobiinae form the Old World representatives of the family, and include the European duke of burgundy fritillary (Hamearis lucina). The third subfamily, Riodininae, are exclusively American, and are by far the largest group, exhibiting an extraordinary diversity of color and form, including the beautiful Helicopis butterflies with six tails on the hindwings and raised underside silvery markings that look like drops of water. The Nymphalidae are a family of four-footed butterflies, having the prothoracic legs greatly atrophied in both sexes, especially the males, and useless for walking. However, the best character for bringing all 6000 or so included species together is the tricarinate antenna, which has three parallel ridges running along their entire inner-lower surface of the antennal shaft. The female abdomen of those species examined has a von Siebold organ, a structure unknown elsewhere in the butterflies. Although subdivision of this very familiar group of butterflies has been a source of considerable disagreement over the last 200 years, advances in the application of DNA sequence data, coupled with systematic reviews of their morphology and greater knowledge of the early stages, has recently led to greater clarity and an emerging consensus. The Libytheinae are a small subfamily of less than 20 species divided into two genera, Libythea and Libytheana. Their long, forwardly directed labial palpi have earned them the name snout butterflies. The eggs and caterpillars of these little butterflies are superficially similar to those of the Pieridae. Formerly often placed as a separate family, the snouts are now widely accepted as the first side branch of the nymphalid family tree. The next side branch now appears to be the Danainae, a worldwide group of usually warningcolored butterflies which have an acrid taste to humans. Supposedly distasteful to most predators, they appear to be primary models in a great number of mimicry complexes. The weak-flying Neotropical glasswings were formerly placed in their own subfamily, the Ithomiinae, but they are structurally and biologically very similar to the true danaines, and are now included by some in the same subfamily. Many have broad transparent areas in the wings in which the scales are reduced to short hairs; others have bright yellow, black, and orange tiger patterns. The widely distributed and migratory monarch, or milkweed, butterfly (Danaus plexippus) is the bestknown representative of the subfamily. In North America, it is mimicked by the viceroy (Limenitus archippus), a member of another nymphalid subfamily, the Limentidinae. In the Indo-Australian tropics, the largest group is the crow butterflies (Euploea), many species of which are black or largely so. The very large cosmopolitan subfamily Satyrinae, containing the woodnymphs, meadow browns,
779
780
Lepidoptera graylings, and arctics, is characterized by the typically weak bouncing flight of the imagines (plural form of imago, which is the sexually mature, usually winged stage of an adult insect) and the bladderlike swellings to the basal sections of the forewing veins of most species. The larvae feed primarily on grasses and other monocots, with a few on lower plants such as lycopods. Erebia and Oeneis penetrate extremely inhospitable arctic and alpine situations, and numerous other satyrines abound in Patagonia, in the high Andes, and in many other montane regions, where many species still await discovery. Subfamily Morphinae is a much smaller but entirely tropical group of large butterflies closely allied to the satyrines. The brilliant metallic-blue species of the genus Morpho are among the most beautiful insects. The owl butterflies (Caligo) have undersides bearing large and very conspicuous eyespots, and also belong to this group. Two more subfamilies that seem to belong with the satyrinemorphine group are the Sino-Himalayan Calinaginae and the pantropical Charaxinae. This last group includes some of the fastest flying and most beautiful butterflies. Most but by no means all species in this entire assemblage feed as larvae on monocots and are strongly attracted as adults to fallen and fermenting fruit. The crown group of the Nymphalidae comprises a further series of subfamilies, of which the Heliconiinae and Limenitidinae may form the first major branch. Well-known heliconiines include Argynnis and Boloria (temperate region fritillaries); Vindula, Cethosia, and Acraea in the Old World tropics; and Heliconius, famous for longevity and remarkable mimetic associations, in the Neotropics. The limenitidines comprise Limenitis (white admirals) together with numerous tropical genera and species. The final series of subfamilies, which may form a terminal group, include the Cyrestinae (tropical map butterflies), Biblidinae (eighty-eights and numerous tropical species), Apaturinae (purple emperors and hackberry butterflies), and Nymphalinae. This last subfamily includes some of the most familiar butterfly genera, such as Polygonia (angle-wings and commas), Nymphalis (tortoise-shells), Vanessa (red admirals and thistle butterflies), Junonia (buckeyes), and Euphydryas (checkerspots). The subfamily is also represented in the tropics by numerous genera and species, including the genus Kallima, or Indian leaf butterflies, the undersides of which bear a remarkable resemblance to dead leaves. See PROTECTIVE COLORATION. Biological Aspects The Lepidoptera are a group on which much research remains to be done. A great deal is still unknown about the genetics, developmental biology, physiology, and ecology of these insects despite the fact that butterflies and moths have proved useful as experimental animals in all of these fields. Economically, the larvae of many species are injurious to certain crops (Table 2), often causing severe losses; while a few, such as the silkworm (Bombyx mori),
are the basis of significant industries. In recent years, trade in living tropical butterflies has grown rapidly for the butterfly house industry (insect zoos). A small number of Lepidoptera species cause medical problems, notably larvae with irritating spines and hairs, some of which can cause severe dermatitis. Ecology and distribution. The variation in larval habits has been discussed under the different families. Lepidoptera of all stages are subject to the attacks of a large number of predators, including birds, mammals, lizards, frogs, and spiders. They also must be wary of rapacious insects, such as dragonflies, preying mantids, pentatomid bugs, asilid flies, and vespoid wasps. Certain wasps (family Sphecidae) paralyze caterpillars, lay eggs in them, and place them in specially constructed cells. Others (braconids and ichneumonids) place their eggs in the caterpillars without paralyzing them. In both cases the caterpillars serve as food for the growing wasp larvae inside them. Many chalcid wasps oviposit (lay or deposit eggs) in the eggs of Lepidoptera. See PREDATOR-PREY INTERACTIONS. One mite, Myrmonyssus phalaenodectes, infests the hearing organs of a variety of moths; and another, Otopheidomenis zalestes, is restricted, so far as is known, to the genus Zale (family Noctuidae). Pseudoscorpions of the genera Atemnus, Stenowithius, and Apocheridium have been found on adults of various species. Lepidoptera are also subject to viral, bacterial, protozoan, and fungal infections. See INSECT PATHOLOGY. The Lepidoptera penetrate almost every place on Earth with the exception of Antarctica. Butterflies of the genus Boloria have been taken at Alert on northern Ellesmere Island, about 400 mi (650 km) from the North Pole. Arctic and alpine tundra areas normally support a Lepidoptera fauna that, although relatively poor in species, is rich in numbers. After rains, deserts are often alive with butterflies and moths, but tropical forests are by far the richest in species. One of the strangest habitats occupied by a lepidopteran is the Neotropical three-toed sloth, on which the larvae of the sloth moth (Bradypodicola hahneli; Pyralidae) live, feeding on the algae that grow in the sloth’s hair. The following list indicates some habitats of larval Lepidoptera. 1. Plant associations, terrestrial a. Feeding exposed on foliage, flowers, or plant lice b. Feeding on foliage in web or case c. Leaf and needle mining d. Living under bark e. Boring in stem, root, fruit, or seeds f. In soil, feeding on roots g. Boring in or feeding on fungi, mosses, liverworts, club mosses, lichens, and ferns h. Living in dried plant products, such as cereals, flour, dried fruit, refuse, and herbarium specimens 2. Associations with social insects a. In beehives
Lepidoptera TABLE 2. Important species of Lepidoptera injurious to crops Family and species Sesiidae Peach-tree borer (Synanthedon exitiosa) Squash-vine borer (Melittia satyriniformis) Gelechiidae Angoumois grain moth (Sitotroga cerealella) Pink bollworm (Pectinophora gossypiella) Tineidae Case-making clothes moth (Tinea pellionella); webbing clothes moth (Tineola bisselliella) Carpet moth (Trichophaga tapetzella) Tortricidae Spruce budworm (Choristoneura fumiferana) Ugly-nest tortricids, leaf rollers, etc. (Cacoecia spp., Tortrix spp.) Codling moth (Cydia pomonella) Oriental peach moth (Cydia molesta) Pyralidae Bee moth (Galleria mellonella) Indian meal moth (Plodia interpunctella) Mediterranean flour moth (Anagasta kuehniella) Crambidae Oriental rice borer (Chilo simplex) Grape leaf folder (Desmia funeralis) Geometridae Spring cankerworm (Paleacrita vernata), fall cankerworm (Alsophila pometaria) Sphingidae Tomato hornworm (Manduca quinquemaculatus); tobacco hornworm (M. sextus) Lasiocampidae Tent caterpillars (Malacosoma sp.) Arctiidae Fall webworm (Hyphantria cunea) Lymantriidae European gypsy moth (Lymantria dispar) Brown-tail moth (Euproctis chrysorrhoea) Noctuidae Cutworms (Euxoa sp., Peridroma sp., etc.) Army worm (Pseudaletia unipunctata) Corn earworm, cotton bollworm, tomato fruitworm, tobacco budworm (Helicoverpa armigera) Pieridae Cabbage butterflies (Pieris spp., especially P. rapae in United States and Europe) Alfalfa butterfly (Colias eurytheme)
b. In ant nests c. Possibly in termite nests 3. Associations with other animals a. In stored animal products, such as woolens and feathers b. In hair of three-toed sloths, feeding on algae c. In plant lice, as parasitoids 4. Aquatic associations a. Feeding on, boring in, or mining in aquatic plants b. On rocks, feeding on microorganisms In temperate zones, different species hibernate at every stage of the life cycle. Individual, seasonal, and geographic variations are common and striking in the Lepidoptera. Color and size are characters most
Damage by larvae
Feeds under bark; the most important insect injurious to peach Infests various cucurbits; often very destructive Feeds on grain in storage and in the field; imported from Europe Feeds on cotton bolls, and causes failure of blossoms to open; most important insect injurious to cotton Clothes moth larvae infest woolen products, upholstery, furs, and other dried animal products Damage similar to that by clothes moths but less common Defoliator of vast areas of coniferous forest Feed on leaves and fruits of apples and other fruits, strawberry, shade trees, ornamental plants, etc. Larva bores in apples and some other fruits; most important pest of apples Feeds on fruits and twigs of peaches, plums, etc. Destroys combs in neglected beehives Widespread and very important pest of dried fruit and animal products Infests flour, stored grain, cereals Very destructive to rice in Asia Sometimes an important defoliator of grapevines Defoliate fruit and shade trees in outbreak years; spring cankerworm overwinters as a pupa, fall cankerworm overwinters as an egg Large larvae are most conspicuous pests of tomato and tobacco; also feed on other solanaceous plants Serious defoliators of forest, shade, and orchard trees Feeds on wide variety of forest, shade, and fruit trees Important defoliator of forest, shade, and fruit trees in New England Similar to gypsy moth Numerous species damage many crop plants Varies tremendously in abundance; in epidemic years it does great damage to corn and other grasses Exceedingly important cosmopolitan pest attacking many cultivated plants Pests of various cultivated crucifers Important pest of alfalfa in southwestern United States
frequently affected, but wing shape, venation, genitalia, and other structures can also be involved. The number of broods per season often differs with geographic location. Behavior and physiology. Migration is the most spectacular behavior occurring in the order. It is most frequent in the butterflies, such as Phoebis, Danaus, and Libytheana, but is also known in moths such as Chrysiridia and the bogong moth of Australia (Agrotis infusa). Huge migratory swarms of Lepidoptera are frequently reported in many parts of the world; but this phenomenon, as well as the related communal roosting of adults, daily use of flyways, hill topping, and so forth, is poorly understood. Aggregations of butterflies at mud puddles are a frequent sight, and both moths and butterflies have been observed pumping—sipping water steadily from a puddle and ejecting it as a stream of drops or fine spray from the anus.
781
782
Lepidosauria Evolution and genetics. There is no extinct family in this order, but it must be acknowledged that the known fossil record is quite poor. The family Eosetidae, based on Eoses triassica from the Triassic of Australia, was originally placed in the Lepidoptera, but its position is very doubtful. The earliest fossil lepidopteran is a homoneurous micromoth from the Lower Jurassic (Archeolepis mane), and so it seems possible that the order did first differentiate in the Triassic. Archeolepis was originally placed in its own (extinct) family, but this is no longer accepted, mainly because not enough characters can be appreciated to reliably locate it in, or exclude it from, one of the living families. The phenomena of mimicry and protective resemblance are widespread in the Lepidoptera. It has been demonstrated that eyespot patterns in certain species elicit escape responses in passerine birds and are thus an effective protective device. Lepidoptera have advantages for many kinds of genetic investigation, especially for the study of populations. The appearance of dark forms of various moths in heavily industrialized areas that have been polluted and blackened by soot is a widespread phenomenon. While this industrial melanism, most extensively studied in the geometrid Biston betularia, is now known to involve complex and shifting selection pressures, it remains one of the best-known examples of evolution in action. See POPULATION GENETICS. Paul R. Ehrlich; R. I. Vane-Wright Bibliography. C. L. Boggs, W. B. Watt, and P. R. Ehrlich (eds.), Butterflies: Ecology and Evolution Taking Flight, 2003; M. M. Douglas, The Lives of Butterflies, 1986; J. D. Holloway, G. Kibby, and D. Peggie, The Families of Malesian Moths and Butterflies, 2001; N. P. Kristensen (ed.), Lepidoptera, Moths and Butterflies, 2 vols., Handbook of Zoology, vol. 4 (parts 35, 36), 1999/2003; H. F. Nijhout, The Development and Evolution of Butterfly Wing Patterns, 1991; R. Preston-Mafham and K. Preston-Mafham, Butterflies of the World, 1988; R. Robinson, Lepidoptera Genetics, 1971; M. J. Scoble, The Lepidoptera, 1992; N. Wahlberg et al., Synergistic effects of combining morphological and molecular data in resolving the phylogeny of butterflies and skippers, Proc. Roy. Soc. London B, 272:1577–1586, 2005; M. Young, The Natural History of Moths, 1997.
Lepidosauria A subclass of living and extinct diapsid reptiles. Lepidosauria and their immediate ancestors constitute the Lepidosauromorpha, one of the two major clades of the Diapsida. By definition, Lepidosauria includes the last common ancestor of the living squamates (lizards, snakes, and the limb-reduced burrowing amphisbaenians) and the New Zealand tuatara (Sphenodon) and all of its descendants. This is a narrower definition than that used in older literature, where Lepidosauria was used as a catchall group for all non-archosaurian diapsids. Lepidosaurs differ from
archosaurs in that the lower temporal arcade (the inferior border of the lower temporal fenestra) is typically incomplete, there is never an antorbital fenestra in front of the orbit or a mandibular fenestra in the lower jaw, and the teeth are generally fused in position (acrodont or pleurodont) rather than implanted in sockets (thecodont). Other lepidosaurian characters include a specialized skin-shedding mechanism, paired male hemipenes (copulatory organs that characterize lizards and snakes), fracture planes in the caudal vertebrae that allow the tail to be shed if grabbed by a predator, and specialized knee, foot, and ankle joints that improve locomotion. As in mammals, the ends of lepidosaurian long bones develop separate centers of ossification (epiphyses) that fuse to the shaft at the end of skeletal growth. See ARCHOSAURIA; DIAPSIDA; REPTILIA. Characteristics. Living lepidosaurs can be divided into two subequal groups, Rhynchocephalia and Squamata. Abundant and globally diverse during the early Mesozoic, Rhynchocephalia are represented today only by the endangered tuatara (Sphenodon) of New Zealand. Sphenodon shows some primitive traits in comparison with squamates (such as rudimentary hemipenes, retention of belly ribs, and little or no cranial kinesis), but it is not a “living fossil” as once thought. The skull and dentition are specialized and the eardrum has been lost secondarily. Squamata form the largest and most successful group of living reptiles, with more than 6000 species and a global distribution. Many squamates are small insect eaters, but the group encompasses herbivores and major predators. They range in size from a few millimeters (such as some tiny geckos and chameleons) to several meters (Komodo dragons and anacondas), and in locomotor type they vary from quadrupedal to bipedal, from limbed to limbless, and from terrestrial runners to swimmers, specialized climbers, and gliders. Reproductive strategies range from egg laying to live birth, and at least eight groups have developed one or more all-female (parthenogenetic) species. See RHYNCHOCEPHALIA; SQUAMATA. Fossil record. Most lepidosaurs are small with lightly built skeletons. Such skeletons are preserved and fossilized only under suitable conditions (fine sediments, quiet depositional environment), and their subsequent recovery depends on the use of an appropriate search strategy. Consequently, most lepidosaurian fossils have been recovered either in the search for early mammals (microvertebrate assemblages from caves, fissures, pond deposits) or early birds (fine-grained “lithographic” limestones), and there are major gaps in the record. The ancestors of squamates and rhynchocephalians probably diverged in the Early Triassic, but the first confirmed records are of rhynchocephalians from the Upper Triassic of Britain and Germany. These are already too derived to be an ancestral type, suggesting rhynchocephalians had a significant unrecorded Early to Middle Triassic history. The record of squamates is rather similar. The earliest known lizards are of Early to Middle Jurassic age (India, Central Asia, Britain),
Lepospondyli
783
but they include representatives of several major lineages. This implies an earlier, Triassic beginning to the squamate radiation. Susan E. Evans Bibliography. M. J. Benton, Vertebrate Palaeontology, 3d ed., Blackwell, Oxford. 2004; M. J. Benton (ed.), The Phylogeny and Classification of the Tetrapods, vol.1: Amphibians, Reptiles and Birds, Clarendon Press, Oxford. 1988; T. R. Halliday and K. Adler (eds.), The New Encyclopedia of Reptiles and Amphibians, Oxford University Press, 2002; G. R. Zug, L. J. Vitt, and J. P. Caldwell, Herpetology, Academic Press, San Diego, 2001.
Lepospondyli An assemblage of diverse, extinct, small tetrapods found in rocks from the Mississippian to Permian periods (340–245 million years ago). In older Linnean classifications, Lepospondyli is a subclass of Amphibia, along with Labyrinthodontia (other Paleozoic tetrapods) and Lissamphibia (the modern amphibians). Recent analyses of interrelationships of all Paleozoic tetrapods have changed this view. Labyrinthodontia, for instance, has been found not to include all descendant lineages and has been abandoned as a technical name. However, research demonstrates that Lepospondyli includes all descendants of a single common ancestor (a clade), and continues to be used in both Linnean and phylogenetic classifications. Anatomical features uniting lepospondyls include a lack of labyrinthine infolding of tooth dentine, paired palatal fangs and replacement pits, absence of an otic notch in the back of the skull, a small forwardly oriented projection on the first vertebra (reduced in some), and single spool-shaped vertebrae (see illustration). However, these features are also characteristic of juvenile or small amphibians. Since all lepospondyls are small, a few specialists still question the group’s integrity. See AMPHIBIA; LABYRINTHODONTIA; LISSAMPHIBIA. Taxonomy and characteristics. In Linnean taxonomy, Lepospondyli comprises five diverse orders: Microsauria, Nectridea, Lysorophia, Aistopoda, and Adelospondyli. All are also considered to be clades with the exception of Microsauria, which might include members more closely related to other lepospondyls than other microsaurs. Microsaurs are small, salamander- and lizardlike animals with diverse groupings (11 families) of variously elongate or consolidated body plans and a distinctive cranial articulation. Nectrideans are mostly aquatic, newtlike animals with unique tail vertebrae which support a broad fin for swimming. Adelospondyls, lysorophians, and aistopods are progressively more elongate animals with reduced limbs; aistopods lack limbs entirely, retain only a trace of the shoulder girdle, and have over 200 vertebrae. All are found in North America or Europe (Euramerica), with the exception of one late-occurring nectridean found in Morocco. See AISTOPODA. Paleontology. Lepospondyls are receiving increased attention from paleontologists in recent
(a)
(d)
(b)
(c)
(e)
Skulls of lepospondyls. (a) Cardiocephalus, a microsaur. (b) Brachydectes, a lysorophian. (c) Scincosaurus, a nectidean. (d) Batrachiderpeton, a nectridean. (e) Coloraderpeton, an aistopod. (From J. S. Anderson, 2007)
years for two reasons. First, multiple studies indicate that lepospondyls are the last Paleozoic tetrapod group to share a common ancestor with the amniote lineage, so they are important for understanding the early evolution of amniotes. Second, some suggest that Lepospondyls gave rise to some or all of the modern amphibians. One hypothesis is that lissamphibians share a unique common ancestor closely related to lysorophians instead of another group of Paleozoic tetrapods, the temnospondyls. Another hypothesis is that caecilians are descended from microsaurs, but salamanders and frogs are descended from temnospondyls. A third hypothesis places salamanders and caecilians within microsaurs but leaves the origin of frogs within temnospondyls. These lepospondyl hypotheses for modern amphibian origins are minority views but are driving much new research into the anatomy of amphibians, including, for the first time, comparisons of development patterns between fossil and modern amphibians. See TEMNOSPONDYLI. Jason S. Anderson
784
Leprosy Bibliography. J. S. Anderson, Incorporating ontogeny into the matrix: A phylogenetic evaluation of developmental evidence for the origin of modern amphibians, in J. S. Anderson and H.-D. Sues (ed.), Major Transitions in Vertebrate Evolution, Indiana University Press, Bloomington, in press, 2006; M. J. Benton, Vertebrate Palaeontology, 3d ed., Blackwell Press, London and New York, 2004; R. L. Carroll, Vertebrate Paleontology and Evolution, W. H. Freeman, New York, 1988.
Leprosy A chronic infectious disease caused by Mycobacterium leprae that primarily affects the skin and peripheral nerves and, to a lesser extent, involves the eyes and mucous membranes. Leprosy, or Hansen’s disease, has been known for more than 2000 years. It afflicts at least 3 million people worldwide and is most common in developing countries. There are about 200 new cases yearly in the United States, 85% of them among aliens. The pathogen, M. leprae, has never been cultured in artificial media, but it can be grown in the mouse (in the footpad) and the ninebanded armadillo, among other animals. About 2– 3% of armadillos in the wild in the southern United States harbor the infection. Its epidemiology is not fully understood, but transmission probably takes place by the respiratory route. The bacillus is very slow-growing, and the incubation period is usually 3–5 years. Less than 5% of any population is susceptible, and these individuals have a deficient cellmediated immune response specifically to M. leprae, which may be genetic in origin. Epidemics have occurred, but are rare. Historically, considerable stigma has been attached to the disease and remains a real, though diminishing, problem. See MYCOBACTERIAL DISEASES. A skin rash and loss of feeling due to nerve damage by M. leprae are the hallmarks of leprosy. Usually the nerve damage is mild, but when severe the inability to feel, particularly in the hands and feet, predisposes the individuals to frequent injuries. Nerve involvement may also lead to a loss of muscle function that produces clawing of the fingers and toes as well as other neuromuscular dysfunctions. Manifestations of the disease depend upon the degree of the immune defect. Initially, most patients develop one or several depigmented areas of skin that may have decreased sensation, a stage referred to as indeterminate disease. The condition may self-heal, but if it is diagnosed it is always treated. If treatment or selfhealing does not halt the disease, it may progress to one of three advanced types. The mildest of these, tuberculoid disease, is usually manifested as a single large depigmented, scaly, numb area. The most severe type, lepromatous leprosy, usually involves most of the skin to varying degrees, with variously sized nodules or other changes, but some cases have no distinct rash. Between the two extremes immunologically is borderline disease, with skin changes of
Borderline leprosy showing skin changes ranging from large tuberculoid lesions to lepromatous nodules.
both types (see illus.). The World Health Organization’s simplified classification labels indeterminate and tuberculoid patients paucibacillary (few bacilli) and borderline and lepromatous patients multibacillary. Diagnosis is confirmed by a finding of bacteria in material scraped from tiny skin slits or a skin biopsy that shows the presence of bacilli and nerve involvement. Paucibacillary disease is treated with dapsone plus rifampin for 6–12 months. Clofazimine is added for multibacillary individuals and the treatment continued for 2 or more years. Treatment may be complicated by reactive episodes marked by fever, skin changes, and nerve inflammation. These side effects can usually be managed with drugs such as corticosteroids and thalidomide. Isolation is unnecessary, since patients become noninfectious within days of starting treatment. All can, in effect, be cured. See INFECTIOUS DISEASE. Robert R. Jacobson Bibliography. R. B. Conn (ed.), Current Diagnosis 9th ed., 1999; H. D. Humes (ed.), Textbook of Internal Medicine, 4th ed., 2000; R. E. Rakel (ed.), Conn’s Current Therapy, 1999.
Leptin A protein comprising 167 amino acids that is important in the regulation of body weight, metabolism, and reproduction. Leptin was discovered in 1994, when it was identified as the missing protein in mice with a spontaneous single-gene defect that caused obesity. These ob/ob mice were very obese, had many of the metabolic abnormalities associated with obesity, and were infertile. However, when leptin levels were examined in other animal models of obesity and in obese humans, the levels were found to be elevated and not low or absent (as in the ob/ob mice). Thus, general obesity (not due to leptin deficiency) has been termed a leptin-resistant state. Since these initial findings, the biology of leptin has proven to be more complex than originally thought. Secretion. Leptin is secreted primarily from adipocytes (fat cells), although other cells in other locations in the body, including the placenta, stomach, and skeletal muscle, also make leptin. Leptin is
Leptin secreted in a pulsatile fashion, and the average levels vary over the day (with the lowest levels in midmorning and the highest levels at night). The amount of leptin secreted by adipocytes is in proportion to the size of the cells. In animals, leptin levels increase soon after a meal and decrease with fasting. In humans, leptin levels do not change significantly with normal meals or with reasonable levels of exercise. However, with massive overfeeding, leptin levels will increase by up to 40%, and with fasts longer than 24 hours, leptin levels will decrease as much as 50%. Thus, in humans, leptin appears to play a role in the more chronic regulation of body fat rather than the meal-to-meal changes in energy intake and energy expenditure. Insulin requirement. Insulin is required for leptin secretion. In animals and in cell culture, insulin increases leptin expression. Likewise, leptin levels decrease when animals are made insulin-deficient. There is also a correlation between insulin levels and leptin levels in humans, but it is difficult to separate the confounding effect of increasing adipose tissue. When humans are deficient in insulin, leptin levels are low, but body fat is also decreased. However, when humans are given excess insulin for several hours (while keeping the blood glucose normal), most studies find no changes in leptin levels. Other hormones that increase leptin secretion include glucocorticoids (such as prednisone) and cytokines (such as tumor necrosis factor alpha and interleukin-1). Hormones that decrease leptin secretion include catecholamines (such as isoproterenol and epinephrine), testosterone, and perhaps thyroid hormone. Gender difference. The leptin level in women is almost three times higher than the leptin level in men. One reason is that women have a higher percent of body fat than men. Additionally, women have more subcutaneous fat (they tend to be “pear-shaped”) than men (they tend to have more abdominal fat and are “apple-shaped”). Subcutaneous fat produces more leptin than abdominal fat, thus contributing to this gender difference. Finally, testosterone, which is present in much greater concentrations in men, decreases leptin and further augments the gender difference. Receptors. The primary role of leptin appears to be in decreasing food intake by way of its action in the brain. Leptin’s actions are exerted by binding to specific receptors. There are at least six different splice variants (isoforms) of the receptor for leptin. All of these receptors contain a leptin-binding area, and all but one receptor span the cell membrane. However, only one receptor (the long-receptor isoform, or Ob-Rb) has complete signaling function and is found predominantly in the hypothalamus (in areas known to control feeding behavior and hunger). The short-receptor isoforms are located mainly in peripheral tissues (such as the lung, kidney, liver, and gonads) and the choroid plexus, but the current role of these short forms of the leptin receptor is still under investigation. They likely play a role in leptin transport in the blood and across the blood-brain barrier.
At least part of the resistance to leptin associated with obesity might be due to decreased transport across the blood-brain barrier. JAK-STAT signaling pathway. When leptin binds to the long form of the receptor, the receptor dimerizes (two receptors come together in combination with leptin) and several cascades are initiated by way of activation of the JAK (janus kinase)–STAT (signal transducers and activators of transcription) pathway (see illus.). This activation by JAK results in phosphorylation of tyrosine amino acids on the leptin receptor, which causes changes in gene expression (via STAT pathways that ultimately decrease appetite and increase energy expenditure), downstream phosphorylation and dephosphorylation of enzymes, alterations in ion channels, and other signal modulation (in part via SOCS-3, a suppressor of cytokine signaling). SOCS-3 can actually feed back to the leptin receptor and inhibit leptin signaling, and this may also play a role in the leptin resistance seen in obesity. It has recently been shown that the effects of leptin on appetite occur through the STAT pathway, but the effects on maintaining reproductive hormones require only the JAK and not the STAT pathway. Additionally, activation of the leptin receptor can also activate pathways analogous to those of the insulin receptor, including the PI-3 kinase pathway (phosphatidyl inositol kinase pathway, which stimulates glucose uptake and other metabolic effects of insulin) and the MAP kinase pathway (mitogen-activated kinase pathway, which stimulates cell growth). Regulation of body weight. Leptin is believed to regulate body weight through its effects on appetite and energy expenditure. When leptin is given to animals, there is a marked decrease in appetite and an increase in energy expenditure. Neural pathways of appetite regulation. The decrease in appetite associated with leptin administration is due to the stimulation of neurons that inhibit feeding and the inhibition of neurons that promote feeding. The long form of the leptin receptor is found on two distinct types of neurons, the POMC/CART (proopiomelanocortin and cocaine- and amphetamineregulated transcript) neurons and the NPY/AGRP (neuropeptide-Y and agouti-related peptide) neurons. The POMC/CART neurons decrease food intake when stimulated by neurotransmitters such as leptin and α-MSH (melanocyte-stimulating hormone, a product of POMC that is made in the hypothalamus and binds to the melanocortin receptors on these neurons). Leptin also binds directly to the NPY/AGRP neurons and decreases food intake by inhibiting the neurons’ ability to be stimulated. However, NPY and AGRP stimulate feeding. NPY stimulates food intake by binding directly to receptors on the NPY/AGRP neurons, whereas AGRP stimulates food intake indirectly by antagonizing or counteracting the effect of α-MSH on the melanocortin receptors. Effects on energy expenditure. The increase in energy expenditure is due to stimulation of the sympathetic
785
786
Leptin leptin receptor monomer leptin
leptin plasma membrane JAK
JAK
cytosol
JAK P
leptin
JAK
JAK
JAK
P
P
P
STAT STAT P STAT
STAT P
reproductive function
P
SOCS-3 nucleus
STAT
STAT P DNA
energy intake
mRNA
energy expenditure Leptin signal cascade. Leptin binds to the long form of the receptor, resulting in receptor dimerization and initiating several signaling cascades. The primary cascade signals by way of activation of the JAK-STAT pathway. The activation by JAK results in phosphorylation of tyrosine amino acids (shown as P in circle) on the leptin receptor. This in turn results in changes in gene expression via STAT pathways, which ultimately decrease appetite and increase energy expenditure. The maintenance of reproductive function requires only the JAK and not the STAT pathway. Signal modulation occurs in part via SOCS-3, a suppressor of cytokine signaling. SOCS-3 can feed back to the leptin receptor and inhibit leptin signaling. (Adapted from D. L. Nelson and M. M. Cox, Lehninger Principles of Biochemistry, 3d ed., Worth Publishers, New York, 2000)
nervous system, which causes increases in oxygen consumption and metabolic rate. In addition, leptin has been shown to induce adipose cell death (apoptosis). As animals become more obese, the response to leptin given peripherally (into the abdomen) decreases, and they become leptin-resistant. Although the ability to respond to central leptin administration (leptin given directly into the brain) remains intact, with continued obesity there is also a decrease in the response to central leptin administration. This adipose cell death, along with an alteration of how the body handles nutrients following leptin administration, might be why there is a preservation of muscle with weight loss (in contrast to weight loss from caloric restriction alone, which is accompanied by a loss in muscle). Regulation of reproductive hormones. When leptindeficient mice were first discovered, they were found to be infertile as well as obese. When leptin was given back to these animals, fertility was restored. In addition, when leptin is given to animals prior to puberty, they undergo puberty earlier than control animals. Thus leptin appears to be part of the system that tells the brain there are adequate energy stores (adipose tissue) for reproduction and allows puberty to commence. Leptin has been shown to increase the secretion of the hormones that regulate reproductive function, including gonadotropin releasing hormone, luteinizing hormone, and follicle stimulating hormone. Other potential roles. Although most of the attention of leptin has centered on its role in decreasing
food intake and decreasing weight, it is not certain that this is its primary role. Until very recently, there has been little need for a hormone that inhibits an organism from gaining as much weight as possible, for more efficient weight gain appears to protect against the next famine. Therefore, leptin likely has other roles as a hormone. Fat accumulation. One role that has received much attention lately is that of a hormone that prevents fat accumulation in nonadipose tissues, including muscle, the liver, and the pancreas. Excess fat accumulation has been shown to be detrimental to these organs and may play a significant role in the development of diabetes. Leptin may traffic fat in such a way that it does not accumulate in these tissues. In addition, leptin increases utilization of fat by producing heat and not energy. Thus leptin might enhance more appropriate and rapid disposal of fat that enters these tissues. Fasting. Another potential role of leptin is in mediating the endocrine response to fasting. When a human or mouse is fasted, there is a decrease in levels of leptin, reproductive hormones, thyroid hormone, and insulin. However, there is an increase in levels of glucocorticoids, catecholamines, and glucagon. This results in a conservation of energy, an increase in the production of glucose for the brain, and an increase in the release of fatty acids from the adipose tissue for muscle. When mice were fasted but given leptin, there was a blunting in several of these responses, including the changes in levels of glucocorticoids, reproductive hormones, and thyroid hormone. Thus
Lepton rather than acting as a signal of excess adipose tissue, leptin’s primary role might be in signaling a decrease in energy, and initiating the stress response to this decrease. Blood cell and bone formation. There is currently much debate on the extent of direct leptin action outside the brain. The gastrointestinal tract does have the long form of the leptin receptor, and signals from this receptor might play a role in nutrient absorption or appetite regulation. In addition, there is evidence that leptin has a direct role in the formation of bone red blood cells, white blood cells, and blood vessels. Treatment of obesity. There are only five known people (from two families) who are deficient in leptin (like the ob/ob mice). They are very obese and seem to have reproductive abnormalities, but they do not appear to have other abnormalities such as increased glucose or insulin. Treatment with leptin has had a dramatic impact on their weight and has allowed puberty to occur. There are also three people (from one family) who do not make normal leptin receptors. These individuals have very high leptin levels, are very obese, do not have normal reproductive function, and have abnormal levels of thyroid hormone and growth hormone. Similar to the patients with leptin deficiency (and unlike the mouse models), they do not have high levels of blood glucose. Leptin is also under investigation as a treatment for obesity in people with elevated leptin levels. Initial reports suggest that leptin administration results in weight loss in these subjects (with preservation of muscle), with the degree of weight loss similar to other pharmacologic treatments for obesity (about 8%). However, since leptin is a protein, it cannot be taken by mouth and must be injected, which sometimes results in pain, redness, inflammation, and bruising at the injection site. This mode of administration and the injection site reactions limit its use as a treatment. Whether administration of leptin or of a leptin-like compound will ever become a treatment for obesity remains to be seen. However, the discovery of leptin has opened a new area in the understanding of obesity and body weight regulation. See DIABETES; ENDOCRINE SYSTEM (VERTEBRATE); HUNGER; INSULIN; LIPID METABOLISM; METABOLIC DISORDERS; OBESITY. William T. Donahoo; Robert H. Eckel Bibliography. R. S. Ahima and J. S. Flier, Leptin, Annu. Rev. Physiol., 62:413–437, 2000; C. A. Baile, M. A. Della-Fera, and R. J. Martin, Regulation of metabolism and body fat mass by leptin, Annu. Rev. Nutrit., 20:105–127, 2000; R. B. Harris, Leptin—much more than a satiety signal, Annu. Rev. Nutr., 20:45–75, 2000; B. Jeanrenaud and F. Rohner-Jeanrenaud, Effects of neuropeptides and leptin on nutrient partitioning: Dysregulations in obesity, Annu. Rev. Med., 52:339–351, 2001; D. L. Nelson and M. M. Cox, Lehninger Principles of Biochemistry, 3d ed., Worth Publishers, New York, 2000; R. H. Unger, Leptin physiology: A second look, Regulatory Peptides, 92(1–3):87–95, 2000.
Lepton An elementary particle having no internal constituents. There are three known charged leptons: the electron (e−), the muon (µ−), and the tau (τ −). In addition, the corresponding antiparticles e+, µ+, τ + are known. Each experiences electromagnetic, weak, and gravitational forces, but not the strong (nuclear) force. Associated with each charged lepton is a corresponding neutral lepton, called a neutrino (ν e, ν µ, ν τ ) or antineutrino: (¯ νe , ν¯µ , ν¯τ ); these have only weak and gravitational interactions. A charged lepton and its associated neutrino form a lepton generation; thus there are three known lepton generations (see table). See ELECTRON; NEUTRINO. Leptons are very small, less than 10−18 m in radius. This is less than 10−3 of the radius of a nucleus and less than 10−8 of the radius of an atom. Indeed, all existing measurements are consistent with the assumption that leptons are point particles. The lepton family of particles has very different properties than the quark family of particles. Quarks interact through the strong force as well as through the electromagnetic, weak, and gravitational forces. By means of the strong force, quark-antiquark pairs bind together to form mesons such as the π meson, and three quarks bind together to form baryons such as the proton. Whereas, as far as is known, quarks are always confined inside hadrons (mesons and baryons) and cannot be studied as isolated particles, leptons act as individual particles and can be studied as isolated particles. See BARYON; FUNDAMENTAL INTERACTIONS; HADRON; MESON; QUARKS. More charged leptons or neutrinos may exist. Searches have been conducted up to 45 GeV in mass for additional charged leptons or neutrinos, but none have been found. Existing accelerators are not powerful enough to allow searches above 45 GeV. The reason there are only three lepton generations below 45 GeV in mass (and that there are only three generations of quarks) is unknown. Decay. The electron is a stable particle: it never decays. The muon and tau are unstable particles, and their average lifetimes have been measured (see table). There is only one known mode of muon decay [reaction (1)]. µ− → νµ + e− + ν¯e
(1)
The tau is heavier than the muon and thus decays more rapidly and in many modes, as in reactions (2). τ − → ντ + µ− + ν¯µ τ − → ντ + e− + ν¯e τ − → ντ + π −
(2)
Lepton conservation and lepton mass. The association between charged leptons and their neutrinos has been described by an empirical law called lepton conservation. For example, consider the electron (e−). There are only two ways in which an e− can be destroyed: it can be annihilated by combining it with an e+ (positron); or it can be changed into a ν e. But an e− cannot be changed directly into any other
787
788
Leptospirosis Details of the three lepton generations
Charged lepton name Charged lepton symbol Dates of discovery Charged lepton mass, MeV Charged lepton lifetime, s Associated neutrino symbols
Generation 1
Generation 2
Generation 3
Electron e± 1890s 0.51 Stable ν e, ν¯ e
Muon µ± Late 1930s 105.7 2.2 × 10−6 ν µ, ν¯ µ
Tau τ± 1974–1975 ∼1777 3.0 × 10−13 ν τ , ν¯ τ
lepton or hadron, such as an e+, µ−, µ+, τ −, τ +, or π. Furthermore an e− cannot be changed into a muonassociated neutrino (ν µ, ν¯ µ) or into a tau-associated neutrino (ν τ , ν¯ τ ). Thus it seems that there is a unique property of the electron (e−) which is preserved or conserved in all reactions; the only other particle carrying this unique property is the electron-associated neutrino (ν e). Similar lepton conservation laws appear to hold for the µ and τ . The only other particle carrying the unique property of the µ− is the muon-associated neutrino (ν µ). The only other particle carrying the unique property of the τ − is the tau-associated neutrino (ν 1). Until 1998 all experimental evidence was consistent with the assumption that lepton conservation is universally valid. However, it is now known that the law breaks down in the phenomenon of neutrino oscillations, which occurs because (1) the neutrinos ν e, ν µ, ν τ do not have definite masses; (2) a quantummechanical mixing of neutrino states can occur. It is now known from the results of neutrino oscillation experiments that the quantum-mechanical state of, for example, ν µ must be expressed as a specific linear combination of three neutrino mass states, at least two of which are associated with definite and distinct nonzero mass (and the same general statement is true of ν e and ν τ ). Consider a ν µ that is created at time t = 0 in muon decay [reaction (1)]. As this neutrino propagates through space, the states of definite and distinct mass of which it is composed evolve at different rates and thus get out of phase. The result is that after a finite time has elapsed, the neutrino is no longer a pure ν µ but is in a superposition of ν e, ν µ, ν τ states. Hence there is a finite probability that in a collision with another particle this neutrino could be converted to an electron or (if enough energy is available) even to a tau lepton. This violates lepton conservation. The masses of the charged leptons (see table) are given in the energy equivalent unit megaelectronvolts (MeV). One MeV is the energy gained by an electron that is accelerated through a voltage of 106 V, and is equal to 1.602 × 10−13 joule. The masses of the electron, muon, and neutrinos are smaller than the masses of any of the hadrons. However, the discovery of the tau, whose mass is larger than that of many hadrons, destroyed the concept that leptons had to be very small mass particles. The mass values m1, m2, m3 of the neutrino states of definite mass are not yet known. At present, results of neutrino oscillation experiments reveal only that
Eqs. (3) apply, where c is the velocity of light. These m22 − m21 = 8.1 · 10−5 (eV/c2 ) m23 − m21 = 2.2 · 10−5 (eV/c2 )
(3)
results imply that at least two of the three states are associated with nonzero (but certainly very small) mass. Production. Leptons can be produced in various ways. One way is through the decay of a hadron. For example, a muon and its associated antineutrino are produced when a pion decays, as in reaction (4). π − → µ− + ν¯µ
(4)
Another way to produce leptons is through the annihilation of an electron with a positron; for example, muons can be produced by reaction (5), and taus can be produced by reaction (6). e+ + e− → µ+ + µ−
(5)
e + + e− → τ + + τ −
(6)
It was through reaction (6) that the tau was discovered. See ELEMENTARY PARTICLE. Eugene D. Commins; Martin L. Perl Bibliography. E. Commins and P. Bucksbaum, Weak Interactions of Leptons and Quarks, Cambridge University Press, 1983; F. Halzen and A. D. Martin, Quarks and Leptons: An Introductory Course in Modern Particle Physics, Wiley, New York, 1984; B. Okun, Particle Physics: The Quest for the Substance of Substance, Harwood, London, 1985.
Leptospirosis A bacterial disease caused by a Leptospira species that is now recognized as one of the new, emerging infectious diseases. The Centers for Disease Control and Prevention (CDC) has referred to “new, reemerging, or drug-resistant infections whose incidence in humans has increased within the past two decades or whose incidence threatens to increase in the near future.” Leptospirosis, presumed to be the most widespread zoonosis (a disease of animals that may be transmitted to humans) in the world, is characterized by a broad spectrum of clinical manifestations, varying from a subclinical infection (showing no symptoms) to a fatal disease. Etiologic agents. Leptospires are spirochetes (flexible, spiral-shaped bacteria with flagella) that are coiled, thin, and highly motile; have hooked ends;
Leptospirosis and contain two flagella, which enable the bacteria to move in a watery environment and burrow into and through tissue. Traditionally, the genus Leptospira contains two species: the pathogenic L. interrogans and the free-living L. biflexa. Leptospires are phylogenetically related to other spirochetes that have a genome approximately 5000 kilobases in size. Recently, a number of leptospiral genes have been cloned and analyzed. Epidemiology. Leptospirosis is primarily an animal disease with a worldwide distribution that affects at least 160 mammalian species. The incidence is significantly higher in warmer climates than in temperate regions, mainly due to longer survival of leptospires under warm humid conditions. Animals can be divided into maintenance hosts and accidental hosts. Maintenance hosts are species in which infection is endemic and is usually transferred from animal to animal by direct contact. Infection is usually acquired at an early age, and the prevalence of chronic excretion of the leptospires in the urine increases with the age of the animal. The disease is maintained in nature by chronic infection of the renal tubules of maintenance hosts. Rats, mice, opossums, skunks, raccoons, and foxes are the most important environmental maintenance hosts, with farm animals and dogs being domestic maintenance hosts. All of these animals shed the bacteria in their urine. Humans are not commonly infected with leptospires and are considered an accidental (incidental) host. Approximately 100 human cases of leptospirosis are reported to the CDC each year, but this probably misrepresents the total number of cases. Transmission. Transmission of leptospires may follow direct contact with urine, blood, or tissue from an infected animal or exposure to a contaminated environment. Human-to-human transmission is rare. Since leptospires are excreted in the urine, they can survive in water for many months. Leptospires can enter the host through abrasions in the skin or through mucous membranes such as the conjunctiva. Drinking contaminated water may introduce the bacteria through the mouth, throat, or esophagus. The incubation period is usually 1– 2 weeks. Certain occupational groups are at high risk for acquiring leptospires. These include veterinarians, sewage workers, agricultural workers, livestock farmers, slaughterhouse workers, meat inspectors, rodent-control workers, and fish workers. They acquire the bacteria by direct exposure or contact with contaminated soil or water. In western countries, there is a significant risk of acquiring leptospirosis from recreational exposure (canoeing, windsurfing, swimming, water skiing, white-water rafting, freshwater fishing) and domestic animal contact. Pathogenesis. The mechanisms by which leptospires cause disease are not well understood. It is known that after the leptospires enter the host, they spread to all organs of the body. The bacteria reproduce in the blood and tissues. Although any
organ can be infected, the kidneys and liver are most commonly involved. In these organs, the leptospires attach to and destroy the capillary walls. In the liver this leads to jaundice, and in the kidneys it causes inflammation (nephritis) and tubular cell death (necrosis), resulting in renal failure. After antibodies are formed, the leptospires are eliminated from all organ sites except the eyes and kidneys, where they may persist for weeks or months, for unknown reasons. Clinical manifestations. The clinical manifestations of leptospirosis can be divided into two forms: anicteric (not jaundiced) and icteric (jaundiced). The majority of infections caused by the leptospires are anicteric (either subclinical or of very mild severity), and patients usually do not seek medical attention. In this mild form, leptospirosis may be similar to an influenza-like illness with headache, fever, chills, nausea, vomiting, and muscle pain. Fever is the most common finding during a physical examination. Most patients become asymptomatic within 1 week. Between 5 and 15% of all patients with leptospirosis have the icteric (severe) form of the disease, which is characterized by jaundice, renal and liver dysfunction, hemorrhaging, and multiorgan involvement, and can result in death. This severe form is also referred to as Weil’s syndrome (after Adolf Weil, a German physician). The mortality rate for is 5–15%. The jaundice occurring in leptospirosis is not associated with hepatocellular necrosis, and liver function returns to normal after recovery. Acute infections in pregnancy have also been reported to cause abortion and fetal death. The clinical presentation of leptospirosis is biphasic, with the acute or septicemic (characterized by bacteria in the blood) phase lasting about a week. This is followed by the immune phase, characterized by antibody production and excretion of the leptospires in the urine. Most of the complications of leptospirosis are associated with localization of the leptospires within the tissues during the immune phase and thus occur during the second week of the infection. Laboratory findings. Since the kidneys are invariably involved in leptospirosis, related findings include presence of white blood cells, red blood cells, and granular casts (proteinaceous products of the kidney in the urine); mild proteinuria (protein in the urine); and in severe leptospirosis, renal failure and azotemia (an increase in nitrogenous substances in the blood). The most common x-ray finding in the kidney is a patchy pattern that corresponds to the many hemorrhages due to the destruction of blood vessels. Diagnosis. A definitive diagnosis of leptospirosis is based either on the isolation of the leptospires from the patient or on immunological tests. Leptospires can be isolated from the blood and/or cerebral spinal fluid during the first 10 days of the illness, and from the urine for several weeks beginning about the first week of the infection. They can be stained using carbol fuchsin, counterstained, and cultured in a simple medium enriched with vitamins.
789
790
Leptostraca Treatment. Treatment of leptospirosis differs depending on the severity and duration of symptoms at the time of presentation. Patients with only flu-like symptoms require only symptomatic treatment but should be cautioned to seek further medical help if they develop jaundice. The effect of antimicrobial therapy for the mild form of leptospirosis is controversial. However, such treatment is definitely indicated for the severe form of the disease, and should begin as soon as a severe diagnosis is made. In mild cases, oral treatment with tetracycline, doxycycline, ampicillin, or amoxicillin may be considered. For more severe cases of leptospirosis, intravenous administration of penicillin G, amoxicillin, ampicillin, or erythromycin is recommended. Most patients with leptospirosis recover. Mortality is highest in the elderly and those who have Weil’s syndrome. Leptospirosis during pregnancy is associated with high fetal mortality. Long-term follow-up of patients with renal failure and hepatic dysfunction has documented good recovery of both renal and liver function. Prevention. Individuals who may be exposed to leptospires through either their occupation or involvement in recreational water activities should be informed about the risks. Some measures for controlling leptospirosis include avoiding exposure to urine and tissues from infected animals, vaccination of domestic animals, and rodent control. See BACTERIA; MEDICAL BACTERIOLOGY; SPIROCHETE; ZOONOSES. John P. Harley Bibliography. R. W. Farrar, Leptospirosis, Crit. Rev. Clin. Lab. Sci., 21(1), 1995; R. W. H. Gillespie, Epidemiology of leptospirosis, Amer. J. Pub. Health, vol. 53, 1963; P. N. Levett, Leptospirosis, Clin. Microbiol. Rev., 14(2), 2001; R. van Crevel, Leptospirosis in travelers, Clin. Infect. Dis., 19:132, 2001.
Leptostraca The only extant order of the crustacean subclass Phyllocarida. The Leptostraca is represented by one fossil and a small number of living genera. These malacostracans (see illus.) are unique in having the carapace laterally compressed to such an extent that it forms a bivalvelike shell held together by a strong adductor muscle. The carapace covers only the thorax, leaving exposed the head, with its uniquely movable rostrum, stalked eyes, paired antennules, and antennae. The thorax is composed of eight malacostracan somites, each with a pair of generally similar appendages (thoracopods). Unlike the typical malacostracan abdomen, the leptostracan abdomen consists of seven somites plus a telson with furcal rami. Only the first six abdominal somites carry paired appendages (pleopods). The retention of a seventh somite and caudal furca, in addition to a fully developed telson in the Leptostraca suggested an evolutionarily early offshoot from the malacostracan stem. Although leptostracans are the only extant
adductor muscle ocular peduncle rostrum
caudal furca telson
thoracic carapace somites abdominal somites
pleopods 5, 6
antennule pleopods 1_ 4 _ thoracopods 1 8
antenna
Paranebalia longipes. (After P. A. McLaughlin, Comparative Morphology of Recent Crustacea, W. H. Freeman, 1980)
malacostracans to have seven clearly delineated abdominal somites, evidence from embryological and neurological studies indicates that in some members of the Mysidacea a seventh somite, or the ganglia thereof, remains in developmental stages. See MALACOSTRACA. Leptostracans use the thoracopods to produce a feeding current and, in females, to form a brood pouch. That secondary brooding function suggests that egg-bearing females generally do not feed, and evidence confirms this. Locomotion in leptostracans is accomplished by use of the first four pairs of pleopods. Most leptostracans are bottom dwellers, living on or slightly under the substrate, but one species is holopelagic, one inhabits hydrothermal vents, and still another is a marine cave dweller. See CRUSTACEA; PHYLLOCARIDA. Patsy A. McLaughlin Bibliography. T. E. Bowman, J. Yager, and T. M. Iliffe, Speonebalia cannoni, n. gen., n. sp., from the Caicos Islands, the first hypogean leptostracan (Nebaliacea: Nebaliidae), Proc. Biol. Soc. Wash., 98:439– 446, 1985; R. R. Hessler, Dahella caldariensis, new genus, new species: A leptostracan (Crustacea, Malacostraca) from deep-sea hydrothermal vents, J. Crust. Biol., 4:655–664, 1984; P. A. McLaughlin, Comparative Morphology of Recent Crustacea, 1980; F. R. Schram, Crustacea, 1986.
Lespedeza A warm-season legume with trifoliate leaves, small purple pea-shaped blossoms, and one seed per pod. There are 15 American and more than 100 Asiatic species; two annual species, Lespedeza striata and L. stipulacea, and a perennial, L. cuneata, from Asia are grown as field crops in the United States. The American species are small shrubby perennials (see illus.) found in open woods and on idle land, rarely in dense stands; they are harmless weeds. Common lespedeza, once known as Japanese clover, is a small
Lethal gene
Lespedeza. (Soil Conservation Service)
variety of L. striata. This variety, unintentionally introduced in the 1840s and used widely until the late 1920s, has been replaced by a larger variety, Kobe. Korean lespedeza (L. stipulacea) has been widely grown since the mid-1920s and is preferred in the northern part of the lespedeza belt, from Missouri eastward, while Kobe is preferred in the lower part of the belt (south to the Gulf of Mexico and across northern Florida). See LEGUME. The annual lespedezas can grow on poorer wetter land than most other forage crops. They give a moderate yield of good quality. The plants reseed well and the young seedlings at the two-leaf stage are very tolerant of cold. Most crop damage is from parasitic nematodes and severe drought, especially in the spring. The perennial lespedeza (L. cuneata), commonly known as sericea, is adapted to the same growing conditions as the annuals but persists better on steep slopes and for this reason has been used widely in soil conservation. The young growth from wellestablished sericea is a moderate-quality forage, but the older growth is poor. The perennial is difficult to establish because the seed does not germinate readily unless scarified or exposed to cold and moisture for several months. Several large species of lespedeza, such as the widely used L. bicolor, are used in plantings for wild life to provide seed and cover. See LEGUME FORAGES; NEMATA (NEMATODA). Paul Tabor
Lethal dose 50 One special form of the effective dose 50, or median effective dose. The lethal dose 50, or median lethal dose, is also written as LD50. It is used when the response that is being investigated is the death of the experimental animal. The median lethal dose is therefore the dose which is fatal to 50% of the test animals. The LD50 for strychnine given subcutaneously to rabbits is 0.5 mg/kg of body weight. This description shows that in specifying the LD50 attention has been given to the species of animal, the route of injection, and the weight of the animals used. For other toxic substances a different set of specifications might be important.
The use of the concept of LD50 is not restricted to toxicology, because there are many therapeutic substances, such as digitalis, which can be assayed more their lethal effect than by their therapeutic effect. The same active principle is involved in both effects. See BIOASSAY; EFFECTIVE DOSE 50; TOXICOLOGY. Colin White
Lethal gene A gene which brings about the death of the organism carrying it. Lethal genes constitute the most common class of gene changes (that is, mutations) and are reflections of the fact that the fundamental function of genes is the control of processes essential to the growth and development of organisms. In higher diploid forms lethals are usually recessive and expressed only in homozygotes. Dominant lethals, expressed in heterozygotes, are rapidly eliminated and thus rarely detected. Recessive zygotic lethals are retained with considerable frequency in natural populations of cross-fertilizing organisms, while gametic lethals (those affecting normal functioning of eggs and sperm among animals, or the pollen and ovules of plants) are subject to stringent selection and are accordingly rare. Evaluation and analysis. Some lethal mutations have proved to be losses of small or large sections of chromosomal material, rather than gene changes in the strictest sense. The presence of lethal mutants or losses is expressed as a failure of growth, leading to the premature death of the organism carrying them. As most detectable lethal genes are recessive, their effects are observed only in homozygotes. If the processes controlled by a particular gene come early in the developmental sequence, the disturbance usually has fewer ramifications. An analysis of the effects of lethal genes provides a valuable means of investigating the complex processes of embryonic development and cellular differentiation and, in the case of humans, has practical medical implications as well. In microorganisms the study of lethal biochemical mutants opened up a whole new era in the unraveling of biosynthetic processes and led to the establishment of modern biochemical genetics. See CHROMOSOME; MOLECULAR BIOLOGY; MUTATION.
791
792
Lettuce Conditional mutants. Although typically lethal genes express themselves independently of internal and external environments, there are cases in which expression is dependent on some particular condition. The temperature-sensitive (ts) and amber (am) mutants in bacteriophage T4D provide instances where the mutant behaves as a lethal under one set of conditions (restrictive) while under another set of conditions the phenotype of the mutant resembles wild type. Such conditional mutants have been used to analyze gene function in the synthesis and assembly of the macromolecules making up bacteriophage particles. Over 40 conditional lethal mutants have been shown to affect the processes leading to the complete morphogenesis of T4D phage particles. Such conditional lethals are thus invaluable tools for full analysis of gene action in the organization and functioning of biological systems. Obtaining test materials. One of the difficulties in establishing the mechanism of action of lethal genes in higher organisms is that only one-quarter of the progeny in crosses of heterozygous lethals are homozygous. Analysis at the biochemical level can be greatly simplified if progenies consisting wholly of homozygous lethal individuals are readily obtained. Conditional lethal mutants of the temperature-sensitive type, readily produced in the fly Drosophila with the aid of chemical mutagens, which are viable and fertile when raised at 18◦C (64◦F), and completely lethal at 28◦C (82◦F), have been the subject of active investigation in several laboratories. By maintaining flies at 18◦C and transferring eggs to 28◦C, any quantity of lethal tissue becomes available for biochemical study. Occurrence in natural populations. Lethal genes in natural populations of plants and animals have received considerable attention from geneticists. In certain instances these may form the bases of polymorphisms in such populations. In some organisms they have led to what are known as balanced lethal systems: When two different lethal genes (or losses) are located at different points in different members of a pair of homologous chromosomes in such positions, or circumstances, that recombination by crossing over is prevented, only those individuals heterozygous for both of the two different lethals survive. In this way strains, or even species, which are permanent heterozygotes have arisen as in the evening primroses of the genus Oenothera. In human populations the accumulation of lethal mutations have been a matter of some concern in the face of increases in mutagenic agents in the environment and the lowering of selective pressures associated with modern life. The human wastage and misery which accompany high frequencies of lethal genes are considerable. See POLYMORPHISM (GENETICS). Examples. Lethal genes include the following: biochemical lethals in microorganisms such as Neurospora, Escherichia, and Salmonella; lethal chlorophyll mutants of higher plants; notch and many others in the fruit fly Drosophila; creeper in fowl; yellow, brachyury, W-anemia in the mouse; gray-lethal
in the rat; Dexter in cattle; thalassemia, sickle-cell anemia, and many others in humans. See HUMAN GENETICS. Donald F. Poulson Bibliography. E. Hadorn, Developmental Genetics and Lethal Factors, 1961; D. L. Hartl, Genetics, 4th ed., 1998; M. M. Moore et al. (eds.), Mammalian Cell Mutagenesis, 1988; M. W. Strickberger, Genetics, 3d ed., 1985.
Lettuce A cool-season annual, Lactuca sativa, of Asian origin and belonging to the tribe Cichorium of the Compositae family. Lettuce is grown for its succulent leaves, which are eaten raw as a salad. Four varieties of this leading salad crop are head lettuce (L. sativa var. capitata; see illus.), leaf or curled lettuce (L. sativa var. longifolia), and stem or asparagus lettuce (L. sativa var. asparagina). There are two types of head lettuce: butterhead, and crisphead or iceberg. See ASTERALES. Propagation. The outdoor crop is propagated by seed usually planted directly in the field, but also occasionally planted in greenhouses for later transplanting. Field spacing varies, but usually plants are grown 10–16 in. (25–40 cm) apart in 14–20-in. (35–50-cm) rows. Greenhouse lettuce, predominantly leaf and butterhead varieties, is transplanted to ground beds with plants placed 7–12 in. (18–30 cm) apart. Uniformly cool weather promotes maximum yields of high-quality lettuce; 55–65◦F (13–18◦C) is optimum. Heading varieties (cultivars) are particularly sensitive to adverse environment. High temperatures prevent heading, promote seed-stalk development, and result in bitter flavor and tip-burned leaves. Thus, commercial production of lettuce is extensive in several California and Arizona valleys where mild
Head lettuce (Lactuca sativa var. capitata). (Asgrow Seed Co., subsidiary of The Upjohn Co.)
Leukemia winter and cool summer climates prevail. Production in other areas with hot summer weather is restricted to the cooler spring and fall seasons. Crisphead or iceberg lettuce is the most widely grown type; cultivars belonging to the Great Lakes variety account for most of the acreage. Popular cultivars of other types of head lettuce are Butterhead, Dark Green Boston, White Boston, and Bibb; leaf lettuce types are Grand Rapids, Prize Head, and Salad Bowl; cos or romaine types are Parris Island, Dark Green, and Valmaine. Harvesting. The harvesting of heading varieties begins when the heads have become firm enough to satisfy market demands, usually 60–90 days after planting. Most western-grown lettuce is field-packed in paperboard cartons and chilled by vacuum cooling. Leaf lettuce and cos lettuce are harvested when full-sized, but before the development of seed stalks or a bitter taste. This varies from 40 to 70 days after planting. California raises more lettuce than any other state; Arizona, Florida, and Texas are next in importance. Diseases. Lettuce diseases of greatest importance are caused by viruses, fungi, and bacteria which alter the structure and function of the infected parts, causing stunting, malformation, death, discoloration, and breakdown of tissues. The most serious fungus diseases (and their causal organisms) are sclerotinia rot or drop (Sclerotinia sclerotiorum and S. minor), gray mold rot (Botrytis cinerea), downy mildew (Bremia lactucae), anthracnose (Marssonina panattoniana), and bottom rot (Rhizoctonia solani). The most serious of the bacteria parasitic on lettuce are Erwinia carotovora and Pseudomonas marginalis, both of which cause soft rots. Soft rots are especially damaging when lettuce is not cooled quickly after harvest and is not maintained cool (36– 40◦F or 2–4◦C) during storage and transit to market. The potentially most serious disease of lettuce is common mosaic, caused by a virus that is seed-borne and transmitted from infected to healthy plants by aphids. Fortunately, this disease can be avoided by use of virus-free seed. Lettuce is susceptible to several other aphid-transmitted viruses that are not seed-borne. Diseases caused by these viruses are usually less prevalent than mosaic and of minor importance. Important diseases due to other infectious agents are aster yellows, caused by a leafhopper-transmitted spiroplasma, and Big Vein, caused by an infectious entity that has not been seen with the electron microscope and thus has not been identified and characterized. The causal agent is soil-borne and transmitted by a chytrid fungus, Olpidium brassicae, that infects the roots of lettuce and many other plants without causing evident injury at the site of infection. Tip burn, a physiological disease, is especially damaging to head lettuce due to death and browning of leaf tissues inside the head. It is induced by high ambient temperatures occurring near the time of harvest that disrupt normal organic acid and calcium metabolism. Cultivars now being used have been se-
lected for increased tolerance to tip burn. See PLANT PATHOLOGY. R. G. Grogan
Leucite rock Igneous rocks rich in leucite but lacking or poor in alkali feldspar. Those types with essential alkali feldspar are classed as phonolites, feldspathoidal syenite, and feldspathoidal monzonite. The group includes an extremely wide assortment both chemically and mineralogically. The rocks are generally dark-colored and aphanitic (not visibly crystalline) types of volcanic origin. They consist principally of pyroxene and leucite and may contain calcic plagioclase or olivine. Types with plagioclase in excess of 10% are called leucite basanite (if olivine is present) and leucite tephrite (if olivine is absent). Types with 10% or less plagioclase are called leucitite (if olivine is absent) and olivine leucitite or leucite basalt (if olivine is present). The texture is usually porphyritic with large crystals (phenocrysts) of augite and leucite in a very finegrained or partly glassy matrix. If plagioclase occurs as phenocrysts, it is generally labradorite or bytownite and is slightly more calcic than that of the rock matrix. It may be zoned with more calcic cores surrounded by more sodic margins. Leucite appears in two generations. As large phenocrysts it forms slightly rounded to octagonal grains with abundant tiny inclusions of glass or other minerals zonally arranged. Small, round grains of leucite with tiny glass inclusions also occur in the rock matrix. Augite or diopside (sometimes rimmed with aegirine-augite) and aegirine-augite form the mafic phenocrysts. Pyroxene of the matrix is commonly soda-rich. Olivine may occur as well-formed phenocrysts. Other minerals present may include nepheline, sodalite, biotite, hornblende, and melilite. Accessories include sphene, magnetite, ilmenite, apatite, and perovskite. Leucite rocks are rare. They occur principally as lava flows and small intrusives (dikes and volcanic plugs). Well known are the leucite rocks of the Roman province, in Italy, and the east-central African province. In the Italian area the feldspathoidal lavas are essentially leucite basalts and may have developed by differentiation of basalt magma (rock melt). In the African province the leucitic rocks are associated with ultramafic (periodotite) rocks and may have been derived from periodotitic material. This may have been accomplished by the abstraction of early formed crystals from a peridotitic magma or by the mobilization of peridotitic rocks by emanations from depth. Assimilation of limestone by basaltic magma may help to decrease the silica content and promote the formation of leucite instead of potash feldspar. The crystallization of leucite, however, is in large part a function of temperature and water content of the magma. Conditions of formation, therefore, may strongly influence the formation of leucitic rocks. See IGNEOUS ROCKS; LAVA; PHONOLITE. Carleton A. Chapman
793
794
Leukemia
Leukemia A disease characterized by a progressive and abnormal accumulation of white blood cells, or leukocytes. Leukemic cells are malignant because they have three characteristics common to all cancers: (1) they exhibit uncontrolled growth that is frequently associated with an inability to mature normally; (2) they arise from a single precursor cell; and (3) they disregard anatomic boundaries and metastasize to organs or tissues where leukocytes are not normally found. The expanding clone of leukemic cells infiltrates organs and tissues, particularly the bloodstream and bone marrow, where they disrupt the production of normal cells. The resulting symptoms include fatigue, pallor, infections, bruising and bleeding, and discomfort caused by enlarged organs. In humans, the term leukemia encompasses more than 20 distinct malignancies. See BLOOD; HEMATOPOIESIS. Diagnosis and classification. Normal leukocytes are grouped into two primary types or lineages, myeloid and lymphoid, and virtually any cell of either lineage can become leukemic. Leukemias are also divided into broad categories that are based on the cell involved (myeloid or lymphoid) and disease aggressiveness (either acute or chronic). Subclassifications are based on morphologic, cytochemical, immunologic, cytogenetic, and molecular criteria. Depending on the type of leukemia and reason for the classification (treatment options, research, or reaching a prognosis), only one or all of the criteria may need to be determined. Some characteristics, including morphologic and cytochemical, can be established within hours or days, whereas others, such as molecular and cytogenetic composition, may take weeks. The length of time involved is important because therapy for acute leukemia usually needs to be started promptly. Morphologic criteria. A microscopic analysis of stained peripheral blood and bone marrow is the first step in diagnosis (Fig. 1). In the appropriate clinical setting, that alone may be all that is needed to classify some leukemias, such as chronic myelocytic leukemia (Fig. 2). Morphology can differentiate lym-
platelets
lymphocyte granulocyte
erythrocyte monocyte
Fig. 1. Normal stained blood smear showing a granulocyte, lymphocyte, monocyte, some erythrocytes, and a clump of platelets.
leukocytes
2 µm
precursor cells
Fig. 2. Stained blood sample taken from a patient with chronic myelocytic leukemia. Many of the cells are bone marrow precursor cells, which are not ordinarily found in the circulating blood. In addition, the total leukocyte count is greatly increased over that found in normal blood.
cytoplasm
Auer rod
1 µm Fig. 3. Two abnormal cells in the peripheral blood of a patient with leukemia, both of which have crystalline structures in their cytoplasm. The crystals, known as Auer rods, are composed of lysosomal enzymes such as acid phosphatase and peroxidase and are virtually always found in patients with acute myeloblastic leukemia.
phoid from myeloid types by the presence of Auer rods, which are intracellular structures that appear only in the acute myeloid leukemias (Fig. 3). Cytochemical characteristics. Special stains that impart an easily recognizable color are routinely used to help distinguish cell lineage. One of the most useful stains detects the enzyme peroxidase, which appears only in myeloid cells. Acute myeloblastic leukemias can usually be subclassified by using only morphologic and cytochemical tests. Immunologic reactivity. Monoclonal antibodies can bind to specific molecules on cells called antigens, and fluorescence or histochemical techniques can be used to reveal antibodies that bind to specific antigens found only on the surface of certain types of cells. Thus, it is possible to assign a leukemia cell to the lymphoid or myeloid categories when the morphologic and cytochemical data are ambiguous or conflicting. Immunodiagnosis has been particularly helpful in subcategorizing lymphoid leukemias and subtyping previously unclassifiable leukemias. Immunologic techniques have also been helpful in establishing clonality (the property of arising from a single cell) and in leading to increased understanding
Leukemia of basic leukemia biology. See CELLULAR IMMUNOLOGY; MONOCLONAL ANTIBODIES. Cytogenetic analysis. Leukemia diagnosis, prognosis, and etiology have been greatly improved through the analysis of individual chromosomes that have been differentially stained to reveal subchromosomal segments. In most leukemias, distinct chromosomal abnormalities appear nonrandomly with much greater frequency in almost all leukemic cells analyzed. The abnormalities are confined to the leukemic cells and are not found in other cells of the body, and so they are acquired defects. The best-known example is the Philadelphia (Ph) chromosome, a translocation of genetic material between chromosomes 9 and 22 that is detected in more than 90% of chronic myeloid leukemia cases. Cytogenetics also provides information on etiology and prognosis. Deletion of part of the long arm of chromosome 5 (5q-syndrome) or of one of the two chromosomes 7 (monosomy 7) strongly suggests that the leukemia was secondary to prior chemotherapy or arose from a preexisting bone marrow disorder, both of which suggest aggressive disease, poor response to therapy, and a shorter survival time. See CHROMOSOME; HUMAN GENETICS. Molecular studies. Molecular biology identifies and analyzes small fragments of deoxyribonucleic acid (DNA) and has helped to confirm the importance of cytogenetics, the analysis of whole chromosomes. Molecular biology has allowed deeper insight into leukemogenesis and the diagnosis and classification of leukemia. For example, if an increased number of lymphocytes are present (lymphocytosis) but the evidence for lineage or malignancy is not clear, a finding of an abnormal amount of specific DNA that encodes for an antigen present only on the surface of T cells (thymus-derived lymphocytes) is a strong indicator of a T-cell leukemia. Molecular biology has been used following apparently curative therapy to screen blood or bone marrow cells for evidence of abnormal cells not detected by other, less sensitive methods such as morphologic or cytochemical analysis. See DEOXYRIBONUCLEIC ACID (DNA); MOLECULAR BIOLOGY. Differentiating leukemia from a normal reaction to some other disease (secondary leukocytosis) is rarely a problem. However, a persistent yet nonmalignant leukocytosis may require all of the previously mentioned techniques to reach a definite diagnosis. See CLINICAL PATHOLOGY. Epidemiology. In the United States an estimated 55,000 cases of leukemia are diagnosed annually and account for 4–5% of all cancers. These figures, and similar ones for other countries, have remained relatively constant since the mid-1960s. In industrialized countries, leukemia is second only to accidents as the most common cause of death among children under the age of 16, and it is the most common malignancy in these countries. Overall, it is the sixth most common cause of death in the United States. Geographic variations in the incidence of leukemias are less apparent than in any other form of cancer. Although worldwide incidence is variable, there is usually no more than a twofold difference between countries or
regions within countries. The exception is chronic lymphocytic leukemia, which is the most common form of adult leukemia in western countries but is unusual in the Far East, Africa, and South America. In Japan, chronic lymphocytic leukemia is virtually nonexistent. Leukemia can occur at any age and even be present at birth, but its frequency rises sharply after the age of 50. Acute leukemias, primarily the lymphoid type, are most common in children and young adults, whereas the chronic leukemias are rare in children. The majority of adult acute leukemias are myeloid. The incidence of chronic myelocytic leukemia peaks between the ages of 40 and 50. Chronic lymphocytic leukemia is rare under age 40; since 90% of cases are diagnosed in people over 50, it is the most common leukemia in the elderly. A slight male predominance is found in all of the leukemias, especially chronic lymphoid, hairy cell, and childhood acute lymphoid leukemias. Research into environmental or infectious causes has led to the study of clusters or microepidemics of leukemia that are geographically or chronologically contiguous. There are no convincing arguments that leukemia clusters occur other than randomly, except for the adult T-cell form. Although not genetically transmissible in the classic sense, leukemia has a tendency to occur within families. Most striking is susceptibility in identical twins: the risk of the unaffected twin of a sibling with leukemia developing the disease has been estimated at 200 times that of the general population. Risk falls off rapidly for other family members and is only two to three times higher for nontwin siblings. Even a woman who has leukemia and is pregnant rarely if ever gives birth to a child with leukemia. Certain cell surface antigens are found more often in family members of leukemic patients than in the general population, but the search for a “leukemia susceptibility gene” possibly linked to an inherited human leukocyte antigen (HLA) type has been unsuccessful. Etiology. An estimated 80% of all human cancers are induced by environmental carcinogens, but only a small fraction of exposed individuals develop cancer. Although many agents are suspected of inducing leukemia, for the great majority of cases the etiology is unknown. It appears that no single factor is causative, but a number of events must take place before leukemia occurs. Radiation. The evidence for ionizing radiation as a leukemogenic cofactor is virtually irrefutable. A doseand time-dependent relationship with development principally of myeloid leukemias has been shown following radiation exposure from nuclear explosions, with atomic bomb survivors and atomic test participants; from therapy for ankylosing spondylitis (inflammation of the vertebrae) or cervical cancer; and from work-related contact, such as radiology prior to effective shielding. Radiation appears to have no role in the genesis of chronic lymphoid leukemia, and normal exposure to diagnostic x-rays obtained during a lifetime of routine medical care, to natural sources of radiation, or to nonionizing radiation
795
796
Leukemia does not appreciably increase the risk of leukemia. See RADIATION INJURY (BIOLOGY). Chemicals and drugs. Chronic exposure to high levels of benzene and perhaps related compounds is associated with a tenfold higher risk of developing myeloid leukemia. The risk from exposure to other chemicals is much less clear, however. Agricultural occupations appear to predispose workers to an increased risk of lymphoma and leukemia, possibly as a result of contact with chemicals or animal-borne viruses. See LYMPHOMA. A clear, strong association has been demonstrated between pharmaceuticals (particularly alkylating agents) that are administered as therapy for a primary cancer and the subsequent development of secondary leukemia, virtually always acute myeloblastic leukemia. Compared with chemotherapy that involves alkylators, radiotherapy alone is regarded as weakly leukemogenic, but the combination of both appears to be worse than either alone. Organ transplantation is usually followed by an increased risk of lymphoma, but leukemia risk also rises. The immune compromise induced by drugs that are used to block graft rejection and possibly the chronic reaction to foreign antigens on the graft as well appear to be related to the induction of malignancies. Viruses. Only in the rare adult T-cell form of leukemia has a virus been implicated. A strong association has been shown between adult T-cell leukemia and a retrovirus [a type of ribonucleic acid (RNA) virus] called human T-cell leukemia virus I, or HTLVI. The leukemia that it causes occurs endemically in Japan, especially the islands of Kyushu, Shikoku, and Hokkaido; the Caribbean; the southeastern United States; southern Italy; and parts of South America and Africa. Viruses are known to cause lymphomas and leukemias in nonhuman animals. Examples are bovine and feline leukemia viruses and avian leukosis virus, a retrovirus that is the source of a variety of hematologic neoplasias in domestic fowl. See AVIAN LEUKOSIS; FELINE LEUKEMIA; TUMOR VIRUSES; VIRUS. Congenital chromosomal and acquired abnormalities. Persons with Down syndrome are 30 times more likely to develop acute, usually lymphoid leukemia than the rest of the population. Down syndrome patients have an extra chromosome 21, resulting in a total of 47 chromosomes instead of the normal 46, and the extra chromosome is thought to account for the increased risk for developing leukemia. In fact, leukemias in persons without Down syndrome who have abnormalities of chromosome 21 have been reported. Other rare congenital disorders that are characterized by chromosomal abnormalities and increased risk of leukemia include Fanconi’s anemia, a potentially fatal blood disorder; ataxia-telangiectasia, a neuromuscular degenerative disease; and Bloom’s syndrome, a condition marked by stunted growth and facial flushing. See DOWN SYNDROME. In the primary immunodeficiency states, malignancies develop 10,000 times faster than in unaffected persons, and each of the immunodeficiencies is associated with a distinct leukemia: X-linked agam-
maglobulinemia (the condition characterized by lack of or extremely low levels of gamma globulin, together with defective antibody production and frequent infections) with lymphoid leukemia, common variable immunodeficiency with chronic lymphoid leukemia, and ataxia-telangiectasia with acute lymphoid leukemia. Preexisting hematologic disorders. The myelodysplastic syndromes are characterized by ineffective production of normal blood cells, and result in low blood cell counts due to abnormal precursor cells in the bone marrow. The abnormal cells are clonal and manifest a spectrum of morphologic and cytogenetic abnormalities. Most have a tendency to evolve into acute leukemia, with the myeloid type predominating. Other clonal blood disorders that are characterized by increased production of one specific cell type or by fibrotic marrow have a slight tendency to evolve into acute leukemia. That tendency is greatly enhanced as a result of treatment with alkylating agents or radioactive compounds. A related disorder, chronic myelogenous leukemia, almost always transforms into an acute leukemia, a change referred to as a blast crisis. This process is heralded in 70% of cases by increased myeloblasts and in 25% by proliferation of lymphoblasts or other immature cells. The metamorphosis changes chronic myelogenous leukemia from a relatively well-tolerated chronic condition into a highly lethal leukemia that fails to respond to treatment. Leukemogenesis. The transformation of a normal cell into a malignant one is complex and not fully understood, but it appears to be caused by a series of events that involve a particular cell in a susceptible host. In all likelihood, these events occur with an exact sequence, and timing and must be of sufficient strength and duration, such as exposure to radiation. The transforming events permanently alter or mutate the genome and confer upon the cell and its progeny a survival advantage over normal cells. Whether there is a block in maturation, as in acute leukemias, or an increase in stem cells (formative cells that have the ability to self-replicate and to give rise to specialized cells) with no maturation defect, such as is seen in chronic myelogenous leukemia, the end result is similar—expansion of the malignant clone. The importance of cytogenetic changes was made clearer with the use of molecular biologic techniques that led to the discovery of oncogenes. Oncogenes are DNA sequences that were discovered in cancer-promoting viruses and that encode for the production of substances important to the normal cell’s transfer of growth signals across the cell membrane. Theoretically, when expressed inappropriately, oncogenes malignantly transform cells. Whether these transformations are by themselves sufficient to cause leukemia is debatable. Cytogenetic and molecular discoveries help to further characterize leukemogenesis but do not yet provide a full explanation. See ONCOGENES. Pathogenesis and clinical course. The two major types of leukemia usually differ in signs and symp-
Leukemia toms. Acute leukemias have a relatively rapid onset, and those with the disease often experience problems immediately. Chronic leukemias have an insidious course and are frequently discovered during an examination for an unrelated problem. For both types, the most consistent symptoms are nonspecific and include weakness, fatigue, mild weight loss, and low-grade fever. Often the leukemic clone first affects bloodforming cells in the bone marrow. To examine the bone marrow, a sample is removed with a needle from the pelvis and then stained and viewed under the microscope. Depending on the type of leukemia, a partial or near-total replacement of normal cells will be apparent. The great numbers of leukemic cells may obliterate all normal bone marrow architecture, including fat spaces that occupy 20% of the marrow cavity in children and as much as 70% in the elderly. It was once thought that decreased production of blood cells was secondary to “crowding out” by the leukemic cells, but it is now clear that the leukemic cells themselves or their products inhibit normal cell production. Leukemic cells are found in the bloodstream. Microscopic examination of a drop of stained venous blood may reveal only a rare abnormal cell or a vast sea of leukemic cells that number as many as 50 times the normal total white cell count, as in chronic lymphocytic and acute myeloblastic leukemias. Leukemic cells may infiltrate organs and tissues other than the blood and bone marrow and cause disturbances in normal function. Their spread to the brain, meninges, spinal cord, kidneys, intestine, adrenal glands, lungs, and heart may cause pain, loss of function, or obstruction. However, symptoms referable to organs other than the bone marrow usually do not appear until later stages of the disease. The chronic leukemias are exceptions since the symptoms that they often present are referable to an enlarging spleen or liver, and these organs may infarct or rupture as they become progressively engorged with leukemic cells that compromise the blood supply. Another exception occurs in the myeloid leukemias, where tumors are composed of masses of leukemic cells called chloromas. Chloromas contain an enzyme that imparts an evanescent greenish tinge to their cut surface and can precede a diagnosis or herald a relapse of leukemia before it is detectable elsewhere. Chloromas have a predilection for soft tissues and are usually found in or near the sinuses and orbital areas of the skull, vertebrae, lymph nodes, skin, and gonads. Because the leukemic cell retains most of the physiologic properties of its normal counterpart, the leukemic clone may appear clinically to be a caricature of a function or attribute of the specific cell type from which it originated. Thus, acute progranulocytic leukemia is invariably accompanied by disturbances in clotting and bleeding due to the release of procoagulant substances from granules in the leukemic cells. Hemorrhage can be fatal, particularly after therapy when leukemic cells die and disgorge their contents. Acute monocytic leukemia,
which exaggerates the normal monocyte function of egress into sites of inflammation or infection, is often accompanied by a marked degree of organ and tissue infiltration, especially of the gums, spleen, liver, and lymph nodes. Therapy. Practical therapeutic goals for the acute and chronic leukemias are distinct. Without prompt, intensive, in-hospital therapy, the acute leukemias usually cause death within a few months. In acute leukemia, the object of therapy is to totally obliterate the leukemic clone and allow normal bone marrow cells to recover. This is achieved through large doses of antileukemic drugs that are potent enough to eliminate all evidence of the leukemia. Unfortunately, such agents lack specificity and destroy normal cells particularly the dividing cells in the bone marrow but the doses are calculated to spare normal stem cells that can eventually repopulate the bone marrow and resume normal blood cell production. After the initial intense treatment, repeated cycles are given for as long as 3 years. Specific treatment of localized collections of leukemic cells, especially in so-called sanctuary sites such as the central nervous system, may be included to prevent relapse. In the chronic leukemias, standard therapeutic principles are completely different. Many patients who initially require no therapy begin mild forms of outpatient treatment as the disease progresses. The intent is not to cure but to control the disease with minimal toxicity. A certain degree of organ compromise (including the bone marrow) is tolerated as long as it is not life-threatening. When therapy is necessary, it usually consists of milder forms of chemotherapy that do not require hospitalization. Chemotherapeutic agents can be given singly or in regimens that combine two or more drugs. Radiation therapy is used in the preparative phases before bone marrow transplantation, in preventing central nervous system relapse, and in treating focal deposits of leukemic cells. Chemotherapy. Many effective antileukemic agents have been synthesized. Combination therapy incorporates drugs that have different modes of action and different toxicities in order to increase cytotoxic potency, account for leukemic cells that may be resistant to a single agent, and lessen cumulative toxicity in any particular organ or tissue. Most antileukemic drugs act by perturbing enzymes or substrates that are related to DNA or RNA synthesis and thus largely affect actively dividing cells. Any treatment must be repeated since the number of leukemic cells may exceed one trillion and a single course of antileukemic drugs will destroy only some of them. Among the first drugs used to treat leukemia were alkylating agents, including cyclophosphamide and chlorambucil, which act by cross-linking DNA and consequently hindering replication and transcription. Similar effects are produced by ionizing radiation. Paradoxically, it is this effect on genetic material that is thought to account for the increased risk of secondary malignancies after exposure to alkylating agents. Anthracyclines, such as doxorubicin,
797
798
Leukemia daunorubicin, and idarubicin, inhibit replication by insertion between DNA base pairs and are among the most effective antileukemic agents. Plant alkaloids, such as vincristine and vinblastine, which are derived from the periwinkle, prevent cellular division by inactivating the mitotic apparatus. They are most useful when included as part of a combination regimen, since their mode of action and side effects are different from other commonly used agents. Antimetabolites have been synthesized to be molecular mimics of normal substrates required for processes that are integral to cell survival. They become incorporated into the leukemic cell and, because of chemical substitutions at important sites on the molecule, block essential DNA synthetic pathways and ultimately lead to cell death. The nucleosides pentostatin, fludarabine, lymphoma, and cladribine are most effective in chronic lymphocytic leukemia, lymphoma, and hairy cell leukemia. See CHEMOTHERAPY. Stem cell transplantation. Stem cell transplantation has had the most positive impact on the leukemia cure rate. Chemotherapy is administered alone or with radiation therapy in doses much higher than those used in standard antileukemic regimens to abolish the leukemic clone at the expense of the normal stem cells in the bone marrow. Patients would die following such treatment unless “rescued” with cryopreserved stem cells. The thawed stem cells are infused into a vein to circulate in the bloodstream, occupy the marrow spaces, and eventually proliferate and produce all of the various blood cells. The stem cells must come from the patients themselves (autologous stem cell transplant) or from a donor (allogeneic stem cell transplant) whose human leukocyte antigens, commonly called HLAs, match those of the patient’s cells as closely as possible (HLAs determine tissue compatibility). In allogeneic stem cell transplants, stem cells are harvested from the peripheral blood of an identical twin (a syngeneic match) or a sibling or unrelated donor (allogeneic). However, the risk of potentially fatal graft-versus-host disease is significant with donor cells that are less than ideally matched. Many patients do not have eligible siblings and require unrelated matched donors. The elderly, who comprise a large segment of those with leukemias, are generally not candidates because of the toxicity involved. Nonetheless, stem cell transplantation is the therapy of choice for eligible patients with acute lymphoid, acute myeloblastic, and other high-risk leukemias who have relapsed after standard chemotherapy. Allogeneic stem cell transplantation is used for patients with chronic myelogenous leukemia who have failed imatinib treatment (see below) and rarely for patients with chronic lymphocytic leukemia and other leukemias. See STEM CELLS. Biologic response modifiers. The search for therapies which are less toxic and more specific for leukemic cells has focused on substances that are derived from natural (biologic) sources or that affect biologic reactions, some of which are thought to be part of the body’s natural defense against cancer. Examples include monoclonal antibodies, small molecules that
target protein kinases that affect the cell cycle and lead to tumor cell death, cell products manufactured by recombinant DNA technology, and the patient’s own killer cells (large, granular lymphocytes that can lyse a variety of target cells) expanded and activated in the laboratory before reinfusion. Alpha-interferon can be used in certain leukemias, including hairy cell and chronic myelogenous leukemia. Recombinant growth factors, such as granulocyte-macrophage colony stimulating factor to promote leukocyte production, show potential as a therapeutic adjunct, shortening the duration of low cell counts after therapy and possibly augmenting the numbers and activity of cells that can destroy leukemic cells. Rituximab, a chimerized (comprising approximately 33% mouse protein and 67% human protein) anti-CD20 monoclonal antibody, is commonly used in the treatment of B-lymphocyte-derived malignancies, including chronic lymphocytic leukemia. Alemtuzumab is a humanized (comprising 5–10% mouse protein and 90–95% human protein) anti-CD52 monoclonal antibody approved for second-line treatment of chronic lymphocytic leukemia. Gemtuzumab ozogamicin is a humanized anti-CD33 monoclonal antibody conjugated to a potent antitumor antibiotic, calicheamicin, and is approved for the treatment of acute myeloblastic leukemia. Imatinib is a small molecule that targets the BCR-ABL tyrosine kinase unique to chronic myelogenous leukemia and sometimes found in acute leukemia. It has revolutionized the treatment of chronic myelogenous leukemia with over 80% of early diagnosed patients responding to therapy. Dasatinib is an inhibitor of multiple tyrosine kinases and has been approved by the U.S. Food and Drug Administration (FDA) for the treatment of imatinib-resistant patients with chronic myelogenous leukemia. See CELLULAR IMMUNOLOGY; MONOCLONAL ANTIBODIES. Complications and prognosis. Each type of leukemia and every therapeutic modality has unique complications. However, because of bone marrow compromise by therapy or the leukemia itself, the most common problems are low numbers of platelets, red blood cells, and neutrophils. Before the advent of blood-product transfusions, most patients—particularly those with acute leukemia— would die of hemorrhage due to low platelet numbers. Decreased production of red blood cells compounded the anemia from blood loss. Now, however, platelet and red cell transfusions can sustain patients with little to no production of their own blood cells for many months. Transfusion of neutrophils, on the other hand, has not been practical and is rarely used. Without neutrophils, patients become susceptible to infections with organisms that normally live as harmless commensals on the skin and mucous membranes and in the gastrointestinal tract. Antimicrobial therapy has succeeded in treating these infections, but the elimination of resident microbes leaves the leukemic patient open to infections caused by unusual and resistant forms of bacteria, fungi, viruses, and protozoa. Thus, infection has replaced hemorrhage as the leading
Level measurement cause of morbidity and death and overshadows all other complications of therapy. Death from infections usually occurs because therapy fails either to eradicate the leukemia or to promptly restore normal blood cell production. The latter could correct the single most important cause of susceptibility to infections decreased neutrophils. See INFECTION. Despite the differences between the acute and chronic leukemias, and despite the fact that the great majority of patients can be brought into a remission or quiescent phase of the disease, leukemia is one of the most lethal malignancies. If cure is considered to be the absence of disease 3–5 years after cessation of therapy, only a small fraction of all leukemias are curable. An exception is acute lymphoid leukemia in children, where therapeutic advances have resulted in the attainment of a complete remission and cures in the majority of children. Survival times in other leukemias are different and range from months in some of the acute forms to as long as 20 years in the chronic types. Overall, for adult and childhood acute myeloblastic leukemia, long-term survival is attained in only 20–30% of cases. The figures for adult acute lymphoid leukemia are only slightly better. Stem cell transplantation offers eligible patients a 40–60% chance for a cure. Biologic response modifiers have clearly had an impact on the therapy of many patients with leukemia. See CANCER (MEDICINE); ONCOLOGY. Kenneth A. Foon; Louis Vaickus Bibliography. D. Catovsky (ed.), Leukemic Cell, 2d ed., 1991; J. Fleischer (ed.), Leukemias, 1993; A. M. Mauer (ed.), The Biology of Human Leukemia, 1990; P. H. Wiernik (ed.), Neoplastic Diseases of the Blood, 3d ed., 1995.
Level measurement The determination of the location of the interface between two fluids, separable by gravity, with respect to a fixed horizontal datum plane. The respective fluids may be any fluids, liquid or gaseous, which do not mix and have specific gravities significantly different from one another. Fluids include granular or particulate solids which are fluidized or handled like fluids. The most common level measurements are, however, between a liquid and a gas or vapour. See FLUID; INTERFACE OF PHASES. Level measurement may be classified in the main categories: direct visual indication of interface location; remotely transmitted indication of interface location; interface location inferred from hydrostatic pressure; interface location inferred from fluid properties. Level measurements may be of an analog or on-off nature. Direct visual measurement. In many cases, fluid levels may be observed directly and consequently measured to obtain trends or magnitudes in volume. Graduated scale. Level is measured directly from a vertical graduated scale partially immersed in the liquid.
Glass window. Level is observed through a transparent window in the side of a tank. The window may be graduated with a vertical scale. Gauge glass. Level is observed in a transparent vertical tube attached to a closed tank. The bottom of the tube is connected to the liquid space and the top of the tube to the gaseous space. The liquid level in the tube corresponds with the level in the tank and may be observed or measured against a graduated scale. Isolating valves usually are fitted in the upper and lower connecting pipes to allow for replacement of the transparent tube (which is usually of glass) without draining the tank (Fig. 1a). Closed tanks are often under some pressure, hence the use of external tubes able to withstand pressure rather than windows. In pressurized systems the upper and lower connecting tubes are fitted with ball check valves to avoid a dangerous discharge of fluid in the event of a tube rupture. For very high pressure systems, metallic ducts with thick glass windows are used. These windows are sometimes made refractive and artificially illuminated to show more clearly the difference between the two fluids (such as water and steam). Remote measurement from float. Where levels cannot be observed and hence measured directly, it is common to use a float and to indicate remotely the elevation of this float. The float must be of an average density between that of the two fluids, the densities of which must be significantly different (such as water and air) to ensure that a sufficient buoyant force is generated on the float, with changing level, to activate the position-sensing mechanism. The float may generate an analog signal which varies over the whole range of operating level, or may generate an on-off signal as the level rises above or falls below a predetermined elevation. A series of on-off sensors at different elevations can generate a digital type of level indication. Float with mechanical indicator. Several simple methods allow the float position and hence level to be observed indirectly from outside the vessel containing the fluids. A vertical rod attached to the float and protruding through a hole in the top of the tank can show the level by the amount of protrusion of the rod, which may be graduated. An external weight attached to the float by a rope or tape running over a pulley at the top of the tank can show the level on an inverse scale outside the tank. A rotating shaft passing through a sealed hole in the side of the tank can be connected to the float by a lever so that any rise or fall in the level rotates the shaft appropriately and so moves a pointer on an external graduated scale. Float with electrical resistance sensor. In a closed vessel a float having a lever connected to a variable-resistance sensor can cause a change in the electrical resistance as the level rises or falls. The change in electrical resistance can be used, via a suitable calibrated electrical instrument, to indicate the level or volume in the tank (Fig. 1b).
799
800
Level measurement Float with magnetic switches. In a closed tank a float pivoting about a fixed point can move a magnet close to the wall of the tank down or up as the level rises or falls about a selected level. A similar magnet just outside the tank is flipped by magnetic repulsion to operate electrical contacts to give a corresponding on-off signal. Magnetic repulsion rather than magnetic attraction is employed to create a definite toggle effect (Fig. 1c). Float with buoyancy effect. A float constrained in its vertical movement will exert a varying force on the restraining mechanism. This force in turn can be measured or converted into an analog electrical signal which can be calibrated to indicate the liquid level. In such an application, level can be measured only
over the height of the float, so such floats usually have slender dimensions (Fig. 1d). Measurement by hydrostatic pressure. The pressure p at any depth h in a liquid of density ρ is given by the following equation, p = ρgh, where g is acceleration due to gravity. Hence, if the density of liquid is known, the depth of liquid above a selected point can be determined by measuring the pressure in the liquid at that point. Pressure gauge. In an application with a simple pressure gauge, it may be calibrated to give a direct reading of the depth of liquid. If the tank is closed and there is pressure in the space above the liquid surface, the difference in pressure between this space and the measuring point must be used.
resistance sensor isolating valve
signal
pivot glass tube
float (b)
(a)
force transducer
repelling magnet
signal
electrical contacts pivot signal
pivot
magnet
float
(c)
(d)
constrained float
Fig. 1. Direct visual and float-type measurements. (a) Gauge glass. (b) Float with electrical sensor. (c) Float with magnetic switches. (d) Float with buoyancy effect.
Level measurement pressure sensor signal
air supply
diaphragm
air supply
signal air bleed
pressure transducer
(a)
(b) seal chamber reference level
acoustic waves same liquid as in tank
same liquid as in tank
transmitter receiver
mercury (c)
signal
power supply (d)
Fig. 2. Hydrostatic pressure and acoustic-wave measurements. (a) Pressure diaphragm. (b) Bubble tube. (c) Manometer. (d) Acoustic.
Pressure diaphragm. In applications where the liquid may be contaminated with aggressive impurities or contain solids or sludge, pressure gauges or their pressure-measuring tappings may become blocked and unresponsive. In such cases, pressure diaphragms installed flush with the inside surface of the vessel may be used. Movement of the diaphragm against a spring may be measured directly to give an indication of the pressure or depth of liquid. However, since such movement is limited, it is better to apply a corresponding pressure to the back of the diaphragm so as to restore the diaphragm to its neutral position. Measurement of this external pressure then can be converted into a measure of the depth of liquid (Fig. 2a). Bubble tube. In order to discharge gas from a pipe or vessel into a liquid at some depth, the pressure of the gas must be at least equal to that of the liquid at that depth. Hence the depth of liquid at the point of
discharge can be determined from the gas pressure, provided the gas flow rate is low enough to eliminate dynamic or frictional effects. A pipe supplied with a steady but low flow of gas (such as air) may be inserted into a tank of liquid and the gas pressure in the pipe measured. This pressure measurement can then be converted into a measurement of the depth of liquid above the point of discharge (Fig. 2b). Manometer. A manometer may be used instead of a pressure gauge for measuring pressure (more correctly pressure difference). A manometer using mercury as the reference liquid reduces the level variation by a factor of about 13, making direct measurement more convenient, and is more sensitive than a pressure gauge. Its location relative to that of the vessel in which the liquid level is being measured usually necessitates twin pipes from each side of the manometer extending back to the measuring points and filled with the same liquid as in the vessel (Fig. 2c).
801
802
Lever Measurement by fluid properties. Certain fluid properties can be readily measured or used to determine the presence or extent of a known fluid. The presence or absence of a fluid can provide an on-off signal indicating whether a certain level has been reached or not, whereas the extent of a fluid can be used to determine the depth. Conductivity. If a liquid is a conductor of electricity, its presence can be detected by a pair of electrodes subject to a potential difference. When immersed on rising level, they can generate an off-on electrical signal. Capacitance. If a liquid is a dielectric, probes can be inserted into a tank and the capacitance between them measured. This will vary with the degree of immersion and can be converted to a measurement of level. See CAPACITANCE. Acoustic. Most liquids conduct sound waves readily, and these are reflected from any interfaces, including the liquid surface. If an acoustic transmitter-receiver located at the bottom of a tank directs sound waves vertically upward and senses their reflection from the surface, the depth, and hence level, can be determined by the time taken for the sound wave to travel up and be reflected down (Fig. 2d). See SOUND. Nuclear. Since gamma rays are absorbed by many liquids, the presence of liquid can be sensed by the attenuation of gamma rays emitted from a gamma-ray source and measured by a detector a short distance away. See GAMMA RAYS. Thermal. If an electrically heated thermistor is subject to immersion in a liquid, it will be cooled more effectively. The resulting drop in temperature will be reflected as a change in resistance which will indicate the presence of the liquid. See THERMISTOR. Robin Chaplin Bibliography. E. O. Doebelin, Measurement Systems: Application and Design, 5th ed., McGraw-Hill, New York, 2003.
Lever A pivoted rigid bar used to multiply force or motion, sometimes called the lever and fulcrum (Fig. 1). The lever uses one of the two conditions for static equi-
FB a
b
lever
FA
O
fulcrum Fig. 1. The lever pivots at the fulcrum.
+
+
(b)
(a)
Fig. 2. Two applications of the lever. (a) Nutcracker. (b) Paper punch.
librium, which is that the summation of moments about any point equals zero. The other condition is that the summation of forces acting in any direction through a point equals zero. See INCLINED PLANE. If moments acting counterclockwise around the fulcrum of a lever are positive, then, for a frictionless lever, FBb − FBAa = 0, which may be rearranged to give Eq. (1). FB =
a FA b
(1)
If FB represents the output and FA represents the input, the mechanical advantage, MA, is then given by Eq. (2). MA =
FB a = FA b
(2)
Applications of the lever range from the simple nutcracker and paper punch (Fig. 2) to complex multiple-lever systems found in scales and in testing machines used in the study of properties of materials. See SIMPLE MACHINE. Richard M. Phelan