Statisticc s to
E v a n Pugh Research Professor of AtmosphericSciences T h e PennsylvaniaState
iv
Chief, Meteorolo...
188 downloads
1781 Views
10MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
.
Statisticc s to
E v a n Pugh Research Professor of AtmosphericSciences T h e PennsylvaniaState
iv
Chief, Meteorology S
r-:-..-,-·-.
U.S. Wea
o.. ...........,.,-'""....,.....,..._...!
I
I. ! i
i
I
1
...
l
I l
l
./
l
'
.,I i
ai1d. M'ineral Scieii.ces CbLLEGE OF EAkTH AND
/. ';
'tHE PftNNSYLVAN,IA .S·l'AtE
. '.Hi¥1\rtR~ii~-i f
1=>ARKj
t,:,f~;l':::::
'}
'
ha,·r.! t•lwa,·s l•ct·'ll used in cotuu'clion with ••:.;u:;ll o.ltcll no in (.,!lot~:< tlu~ appt·urance ,,fa halo ·. :.o·.i ,,j,.d, vr r•. dlin~ pressure is really based on a ·.: , • , 1 , :. "l • ·. 11 L ,•,,, ""'"'-~ i ypcs •A pheuoinena. Sometimes, : .;< i>
~··:t'•·tL•" l:. h1 I' l':·i·.,;.,,, ,,, l'rint Talok J !, abridged frunt Table IV of ··Hpry
J!J.i~
iv
PANOFSKY
w. BRIER
Page
Introduction. . . . . . .....
iii
Foreword .................................. ·.... , ............... . . iII' Contents.................................. , ............ ~ .... , ... . Tables .....
v viii
Chapter FREQUENCY DISTRIIJUTIONS.................................
1
THEORETtCAL FREQUENCY DISTRIBUTIONS .••.... ; . • . . . . . . . . . . I ntrodnction, .............. ·...... : .............. ~ :. . . . . .
32 32 ..'1.3 J5 .'!9
Introduction ...................... .'. . . . . . . . . • . . . . . . . . . . . . t The Frequency Distribution ........ , . . . . . . . . . . . . . . . . . . . . . . 3 The Histogram ........................... ~ . . . . . . . . . . . . . . 6 The Cumulative Frequency Distdbution .................... · 7 The Frequency Distribution in Relation to Prohabilit)'• .. , . . . . 9 The Probability Histogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Frequency Distributions o£ Vectors . . . . . . . . . . . . . . . . . . . . . . . . 1.1, Averages ....................... ·......... ·......... ·. , . . . . . 16 Computation of the Mean ............ , .. , ... ·... , .. ·. . . . . . . . 18 Evaluation of J\Iedian am! Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Statistics Versus Parameters ............. ,'...... : .. ,.,..... 2i Wind Averages ............... , ................. ·. . . . . . . . 21 Degree Days ... , .. ':"': ......·................... ·. . . . . . . . . . . 24 l\-1easnres of V.:tl'iability ................ , . . . . . . . . . . . . . . . . 25 The J\lcan Deviation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 'i'hc Standard De\,iation ...... ·............. , , .. , .......... · 26 Variance .. , ·................ , ....................... , . . . . . 28 Skewness ...................... :. . . . . . . . . . . . . . . . . . . . . . . . 29 · Kurtosis ...................................... ·.. : . .·.. : . . 30
II
The Biuoinial l)istribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Normal Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Poisson Distribution ............ :. . . . . . . . . . . . . . . . . . . . Trans£onnatiomi .................... , ............. :. . . . . .
BI
SAMPLING THEORY.........................................
40
46
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 . Binomial Testing Generally. . . . . . . . . . . . . ·............'. . . . . 51 More Than Two Categories ..................... : .... ; ... : 53 Significance of Means ................. .' . .' ...... : .... , . . . . 58 Dilierence or Means of Independent Samples................ 6,~
v
Con:f..en:Ls (cuntiu:ue.J)
. , , •. , , , , . , • , , . , • . . . • . , . • • . . • • . • . • • • • • • • • , ,
64 64 66 79
l!l.lo\TJ";
5 IX
\
186
IR8 189
191 Introduction ................ ; ............................ ·-191 Purposes of Verification ................ , ........ , ......... 192 Fundamental Criteria to be Sa.tisfiecl. , .... , ...... , ..... ·: ~ ... '195 Verification Methods and Scores ... , , ...... ; ............ , . : 198 Control Forecasts for Comparison .•......................... 204 : · Significance of Forecasts. • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206. ·Pitfalls of Verification ..............•.....•........ ; ·... , •. 206
FoREcAsT VERIFICATioN ••.....•...•.•.•••••.•••••.••...•••••.
L
Appendix .......................................... : . . • . . . . . . . . . 209 Chapter I Chapter III
ulion of Wind Speed and Direction..
14
L .. < .;.:
.: ti, .._·,,mpcttation of thuppose we wanted to know the number of temperatures below ,, -value, for example 33°, that does not happen to be one of the il•nt!H:JIIal"ical limits. Or, inversely, suppose we wanted to know that. ·.-aiue of temperature below which lies a certain percentage of Un: H.:mpcralurcs. This can be founc~ by linear interpolation ;u the (. F- or CF(%)-columns of Table 2, or the answer may be read t!;tcdlv frum the pictorial representation of a cumulative [n;qt;u,cy di.,,. i!.,.Jtivn shown in Figure 2. That figure is called CF,%
roo
80
60
40
20 Median 29.4
0
-· ---rc:::;;__.........,---.----...........--c:/:..____,,.......___,..-___, 16.0 20.0 24.0 28.0 32.0 36.0 40.0 Temp. °F F!Gtllm 2
Ogive from Table 2.
an "r,gi n·.'' The !' questionable use. An example of this' is daily rainfall amounts~;J:;::i.ff,j The mean daily rainfall at many places in.the United States,·:for{r:•::0:,{.: example, is between 0.10 and 0.20 inches, yet the .most common; ' ..,, amount of rainfall is much smaller; the large average is produced ·. · by relatively rare days of rainfall amounts greater than Y2 inch. For other variables, such as pressure, wind speed and temperature, the mean is quite adequate, an~ used almost ex9lusively. ·.';
,,
.
.
'
'
'
,
:
I;
Of the measures of central tendency, the'mean' is most affected by extreme observations and is least' useful for' variables 'for which extreme deviations from "typical" values occur relatively frequently, arid occur in one direction oniy .. The inode, defined as the most probable value of the variable, is not' at all influenced by extremes; the median, the halfway point in' the frequency distribution, is influenced by the number. but not the value .of· extrem,~;·members of the frequency distribution .. ,:.,, ;.: : . '• 'h ·
. A special kind of mean in meteorological' usage: is referred to as a "normal." This is usually the mean over many years·for'a ··::;.~~ given day of the year or a given month of the 'year.i 1' Daily normals are smoothed by some process, because· simple means . have a tendency to show irregular oscillations which are,· pre-'·. sumably, accidental. For example, a 30-year average temperature for March 4 may be lower than similar· means for both· March 3 and March 5. 1 The smoothing can be accomplished, for' example, by computing weekly mearis and plotting them at the mid-point of the corresponding week. · Monthly normals 'usually need 'no smoothing of this type, because the number or'individual obser-' vations·l:~veraged is extremely large . ..,,, · ~:·; · ·· · ··' ,.. 1
In the construction of normals, the attempt is made to average the data at neighboring stations over the same periods. If one station has a shorter record than stations in the neighborhood, or 1 if some data are missing, the mean computed from this short record can be ·corrected by the use of the information in the environs. For example, if this short period was cooler than the complete period at the neighboring station, the resuit obtained from the short period at the new station· is .increased .corre~ spondingly. · · - · ' ' ·· ' .,.~:r~
::~
::~.···:
·;!·.tl···:
t,~..~:~
It isnot at all obvious that the best normal is the one computed .~~ ot all meteorologists agret; Of! the advisabilit>: of,smoothing .~ucll :•singulantles,•: ~s they are called, behevmg them to be s1gmficant. '· .·. . · ·. :· · · .·'
·•
18
Smm
API'LICATIONS OF STATISTICS TO METEOROLOGY
over the longest period of time. It is pmbably fair to say, at least for many practical puqioses, that a normal should be the hest estimate for the next value of the quantity averaged. For example, a normal mean annual temperature should be a good estimate of next year's mean tempe1·ature. Now, temperatures have been rising gradually over the last hundred years. Hence, a normal temperature based on a hundred years is considerably colder than most annual temperatures experienced in the middle of' the twentieth century. A study by one of the authors has shown that a mean temperature over 15 years gives the best estimate for next year's mean, and should therefore be preferable to normals computed over longer periods. A similm· result was found for precipitation. Some frequently used means are averages over only 'two v;ducs. The hcst known quantities of this type are the mean daily temperature (usually an average between minimum and maximum) and. the earth's mean distance from the sun (also an average between minimum and maximum, which also happens to be the semimajor axis of the earth's orbit).
9. Computation of the Mean. The most obvious way of computing a mean consists simply in adding all the data and dividing the sum by the number of ' data. In fact, this procedure is generally used when a computing machine is available. When all the observations are close to a central value, the labor can be shortened somewhat. For example, all the pressures in a given sample may be between 990 mb and 1033 mb, or between 691 mb and 719 mb. In situations such as this, it is advantageous to choose an arbitrary origin, A, somewhere close to the data, preferably a round number such as 1000, or 700. The mean can then be computed by:
X=A+B. N
where ~ denotes the differences of the individual observations from the arbitrary origin. These differences often contain one or two fewer digits than the original values. The procedure outlined here is particularly useful in the case of pressure, density and potential temperature; the variation of all of these quantities is considerably smaller than their absoluoo value. Some statis-
·;'~;;
·.r.-!-'
·t~·
ticians;;'prefer to choose A in such a way~thaCall ~the! deviations ~ are 'positive; for example, they would have• ..used690Jmb\.or:t~;.~,~!: 990 mb}I n the above examples. · ·· ·,; ·,. · ;·'_iiiHil ·.n. '·':'~'(i'ih:r.;,:ll·if.:.;i';,~;~ 1
Wher:t the mean need not be completely 'accurate,'' and ..a)re.':;: .'·. quency!distribution is available with equal class intervals, the so-called "short method" can be used. , This method. s~ves a' great deal or,' time and eliminates the necessity 'or an adding or' com- · puting machine even with large samplesl bt1.t)s n~t.,::;~mP.le;~elY.. precise,;. The formula for the mean is; ·, · · ' ' · ·· ·· · X = A
+
•T,fd
•' ' N .·'· ';.:. . ,.. \'.'!
~-.
·
' • . ··.
I
·:! -··· ,.•,\',\
••),
Here;.the summation extends over the, different cla:sses only, and therefore has much fewer terms than the original formula. 'J i is th.e class interval, as before, and A the arbitrary origin, . l chosen at the class mark of a group, preferably the most populated group. d represents the quantity: ·(Xc-A)/i,• where •Xc is the class mark of each group. ·The quantity dis either a positive' or negative integer; it represents the number of groups· a given· ';}.; group is distant from the group containing the arbitrary origin. ,. -
I
. (..
,
.
,
.:
: ·: '
Ii
Table 4 gives an example of the calculation of the mean by the\ short method for the frequency dist~ibution of temperature given' in Table·1, No. 1. The arbitrary origin was chosen. asthe' center ofthe.·thirdgroup 25.95. · · 1 .·'···.: :, :.: •''' ;:';,,;,,'!,.·.:n:·;.il :1'f:.~~
Classilnterval, °F 16.0- 19.9 20.0-23.9 24.0- 27.9 28.0- 31.9 . 32.0-35.9 36.0- 39.9
....
'I
'
.
\
\
.
:
.. ; \
~-~.~'·.-~ ~:
(:
i
t·.
.
~ii·l·H·.
Xc, op 11.95 21.95 25.95 ' 29.95 33.95 37.95
59
48
··:.
20
SOME AI'PLICATIONS OF STATISTICS TO METEOROLIR>Y
answer only if the mean of each group is equal to the central value of each group. If all observations were situated at the upper or lower limit of each group, the maximum error of i/2 in the mean might ensue. To some extent, this error can be controlled by choosing the exact class limits before the frequency distribution is started. However, if the slope of. the frequency polygon is large, the mean of each group does not lie at its center. For example, if the frequencies are increasing rapidly, the mean of the groups is above the center, and vice versa.
10. Evaluation of Median and Mode. The evaluation of the median follows from Section 4. lt proceeds best by linear interpolation of a cumulative frequency distribution or by reading the variate correspon.;y; .• ,,,.,:···.,,•· !.. !nc·•t! 1 k:-~· · tf.tt,:·
·/
.,
;,_,,:~::·,:·.t::·'tt.
.,.,.~d·'·!ifn.l··\·d'.·
Climatologists have found it convenient to clefine·Hpersistence'! bytheratio: ·"·· ·•·t·'dl) ·.; .:, , .,._,·,11:;; . '
24
SOME 1\PI'LICATIONS OF STATISTICS TO METEOROLOGY
P=
of the Resultant Wind Mean Wind Speed
If the wind always blows from the same direction, the persistence is 1; if it: is equally likely from all directions, or blows half the time from one direction and half the time from the opposite, it is 0. J tis clear from these properties why the quantity P measures the persistence. In the example above, the mean wind speed comes out to 17 mph, and the persistence 47%. In general, a statement of the resultant wind or the prevailing wind direction without a statement of persistence of that wind means very little.
13. Degree Days. In the case of temperature dat.-.'1, the number of degree days has been widely used as a figure summarizing considerable information. The number of degree days is given by: N = X (65 - t)
where t is the mean of minimum and maximum temperature on specific days in degrees Fahrenheit, and the summation extends over all days with temperatures under 65° F'. For example, if the mean daily temperatures in a week are 50°, 70°, 60°,. 55°, 60°, 62°, 50°, the day with an average temperature of 70° would be omitted in the calculation. The total number of degree days would then be 53. The usefulness of degree days is due to the fact that the number of degree days is proportional to the amount of fuel needed for home heating in a given period, neglecting, of course, the effect of wind. Thus, for example, fuel dealers supply their customers with additional fuel at regular degree day intervals. The colder the weather, the shorter the period corresponding to a given number of degree days. Note that the number of degree days cannot be computed from mean temperature, since temperatures above 65° may enter the mean. As another example of the use of degree days, consider the problem of determining whether the installation of insulation in a given home has cut fuel consumption. It may be necessary to compare the consumption in a mild winter with that in a hard ·winter. Then, the fuel consumption in each season, divided by the number o,f degree days in the season, takes into account any change in the kind of winter; and two such ratios in different winters can be compared to show whether the fuel consumption
;
•
-
'
~l
14. Measures of Variability.
\'' '•'
.
'I:'->
'
.
~
'
•f I
'
In order to characterize a meteorological' variable,' an average · is often,not sufficient. For example,· the annual average temperatures, at San Francisco and NewYork are nearly the.same;.yet; the climate at the two places is quite different due ..to the much greater' variability of temperature .at .N~w ,York.',' rAs:another ~ example, a contractor would like to base his bid on the number of days lost due to rain. But the mean number of days lost ina certain iseason is not sufficient information (on •which .t() calculate the risk:' The variation of rainr,days ~~or,n.one year;;,to.. tiJ~:.next .J "' J, is equally important. . .· · · · • , . . . . · . : ,., , '\' .. ·' ~ · •.
·.P' I·
•
• •
· ;
\';' '. •\i • : I! J ,i /
'
'
'i
18. Skewness.
A frequency distribution is said to be.:positively-skew· if• the mean is •grea ter than the mode, negatively skew' if the meari' is less than:the mode. A coefficient of skewness should be dimen• sionless,fand can be defined by ' · · ;· I•.: .,., ,:• :'i :i ; .. ,i: . . :.~~·"i iTl;··
l/.!.1:.
·tun
sk =
' 1.-•::qr·. J 1d: . :1hdirpi>;
X - Mode.· • ·q, s · ·· 1
. i(~.,:
i!•.IPr: : ·.• JJ,,:l );.
···/;:tJ .:p '··Jt;~t/·,,1"i~..
Since th~' mode is difficult to estimate,' and approximately; X - Mode = 3 {X - Median) · "' '··;r: ··'td'' •i:! t:;~ 1:> ·•::r.: l ;,
sk =
• ·
3 (X - Median)
s
qua~tity
The :Z.x3 also has the' correct 'signl for 'a 1Coefficie9t(o(:J skewness.n: Therefore, another coefficient· of· skewness has!!also ~ ::' beendefined as: ' 1 • ·, '·'· :··.crll. '{l'lll:·r••!iJti\!Hi:~ib ·. 1 ·tc• sk == :ZX' •·.r:~ 1:< ;:p 1:.1:1 'uh. ; 1:d.1 :t;,, !
N,s3., ._,;\
·:f.lH ;;fH'.:l"!C.'J r·"H!:i(;·· {'
..
;;rn·
''
Frequency, f
150 100 I---
50
-
-
0 4.9 5 9.9
r---t
-
10 14.9
-
15. 19.9
-
20. 24.9
-
25. 29.9
-
30. 34.9
35. 39.9
I I
40.44.9
I I
Wind Speed, Knots FIGURE 4 Frequency Distribution of Wind Speed at La Guardia Field, 1932-1947
Figure 4 shows a typical positively skew frequency distribution, that of wind speed. -The mean wind speed is usually greater than the most probable wind speed, due to the influence of the relatively few wind speeds of large magnitude. Figure 4 shows the distribution of wind speed :it New York, March 1932-47 (1934 omitted). The mean wind speed is 16.7 mph, th'e mode can be estimated as 12.7 mph. Frequency distribution will generally be skew when there is a physical cut-off close to the observed range of the observations. In the case of wind speed, for example, the speed cannot be negative; hence, wind speed distributions are positively skew. Similarly, rain cannot be negative; hence, frequency distributions of daily rainfall are extremely skew. However, annual rainfall· amounts in reasonably humid climates are fairly symmetrical; this is because the cut-off of. zero rainfall is far outside of the , ra1.1ge of existing observations.
19. Kurtosis. Two frequency distributions may have the same mean, dis· persion, and skewness, but may differ in "kurtosis." One distribution may have relatively few cases near the center; so tlwt the histogram appears flat __ (low kurtosis), or most of the observations may lie near the cetiter (high kurtosis). The
';
;~~·:i.i{1~~~f;Y.rff1~i' ; 1 quantitY,~-~0 is proportional to kurtosis;· 'and/ ·. ·.~~~~
t,'J.,." ' ···Ns4 defined,\as the coefficient of kurtosis.• :.·l':]··-.J.I'(Lf1l.ct' ' ' • I
'
.
• • .'
~-'.'VJi,.'
~,
1
,, •
A case of particularly low kurtosis ~igh_t be the frequency', distribution of cloudiness. 'This'·might 1 ~beJa. symmetric\ •: ;. ~ J•:· c•,r : ·· ,-.;;t;~_-.rt_ I
Theoretical Frequency DistrilLHLlltions CHAPTER
II
1. Introduction. In Chapter 1, the probability density histogram was mentioned. This histot,:ralll was based on the variation of the quantity f /Ni ns f111wtion of the vnrinte, say, X. Each box in the histograin had a height of f/Ni and a width of i. The entire area of this hi!ilogram was i'kf/Ni = 1. Now suppose, that more and more data are added. Then, as N increases, the class interval can be made successively smaller without affecting the area. As N approaches infinity (that is, we eventually include all possible data of the same general type), and as i approaches zero, the ratio J/Ni lends to p, a linite limit, provided that the area has lwen held cnns1"anl: and equal to 1. Graphically, this means thnt, as the width of the boxes in the histogram approaches ~Zero, the step-like lop of the histo;{ram becomes a smooth curvUEN
STRIJ
!S,;i\)li
.:::-~,.. ·~, tt f.
~
'··'d lo,t•{\
Note th~t; in this case, there is no difference' b~tween,prob~bility and pri?J?,.fbility density since ,the cla~s. i~}.t~r~.~li~, 1,.·.·.~:~· ·!~.·~\~ :r:.::, 1 ~:.:;.~ ~ The mean of the binomial distribution is ·Np.L For'•example,
if the probability of a head is .8, and 10 coins are spun,· the· mean
of the distribution .will be at 8 heads and 2 tails,,·, The symmetry
...
: \:':.
A:
IE
'TATI
TION
.. "0
TABLE 6
Binomial Distribution for p
M
%,
N
=
10
Nu111ber or Heads
Probability
0
1/1024 10/1024 45/1024 120/1024 210/1024 252/1024 210/1024 120/1024 45/1024 10/1024 1/1024
l
2 3 4
5
6
7
8 9 JO
of the distribution exists only for unbiased coins, i.e., when = 1 - p = 31. The standard deviation of the binomial dis-
p
tribution is
vN (1
- p) p.
As mentioned before, the binomial distribution is quite useful before passage to the limit. It is applicable whenever statements concerning two alternatives only are required. One type of problem comes up frequently in research. The question is to determine qualitatively the influence of one variable on another. Then, the cases are often divided into two groups: those for which the influence seems to be in one direction, and those where the influence seems to be in the other. If there were no relation at all, the groups should be of equal size. If the groups are of unequal size, the binomial distribution gives a clue as to the probability that the two types of influences are really equalty probable, and that the different size of the observed groups is just due to random variation between experiments. If the probability is relatively high, there is a good chance that the indicated relation is not real. This problem will be discussed mote fully in Chapter III. Another application of the binomial distribution arises when
we are interested in only two types of meteorological phenomena,
for example, rain and no rain, cloudy and clear, temperatures below freezing and temperatures above freezing. In these cases we can often determine the "bias," p, from the past record and make statements concerning probability of occurrence. For example, the records at a town in Florida might indicate that in
, ·. ;·
:,y,~i;,
::.·1i";;; t;. ·,,1: :;·'J~·?~\,•;'·;9~i-;}Jj
.
July thunderstorms occurred on!'; the .average l every!\hirl:,"day~,;·:~ Then p~= %. Assuming that the:occ,urrence of a.thunderstorm\;f.; on a given day is independent of whether a thunderstorm occurred:~; the preyious day, we can determine the probability. that, for exam pie, there will be a thunderstorm every· day for 1a:week r• .· . ·. , . . . . ·I•.• ... ( ~ )-;;orthat there will be no thunde~storms.:f~~:.~:~~~k
21 7
[m,'7' z\~7]. 8
\}
or for any other n
~~~~; or'~ ~~~ers;orm;.' •• ',,
.~,
' '
3. ThtriNormal Distribution. : · j
I
•
'
I
'· :,:;;i
I
; ; .. ' ' . ~ .
Hi"' J;i.,u .. 'II'''
'j
~
~;
":-
The ~binomial distribution as defined in. the preyious section will spread out indefinitely as the number of coins is· increased. In order to prevent the spreading out, the. standard ·'deviation has to.'•''J'il) be held constant during the limiting procesS, This.can be · ' , •
,
,o'
! '
1"
·
·
'
;
··x: 'x\· ·
I
•,, I
.:, ·•
·
. ;.: .. · ~-.
,..
Here, a is used for standard deviation instead· of s, because W«fare'' · dealing. ~ith all possible dat':. A~ N. is incr.eased indefi.ni~~}~~~~h.~ .· ·, probabd1ty that an observation hes.m the. m~er:at dr 1s: ·, .. .r IJ(J·
i
~-~t
1 '
pdT = yl e ·
·,,
?£'
1\\ .
-T'/'l . •i
·.q
.ird
r 'o