“Inference for the
Mean of a Population”
In Chapter 10, we looked at two forms of FORMAL
INFERENCE:
(i)
Confidence Intervals
(ii)
Tests of Significance
We had to make an unrealistic assumption, though, in order
to first understand the basic process.
It was unrealistic to assume we knew
, the standard deviation of the population.
In Chapter 11, we are going to examine how to adjust for
not really knowing
.
When we looked at steps involved with performing a test
of significance in Chapter 10, we said there were 4 steps:
(1)
State the hypotheses.
(2)
Calculate the test statistic.
(3)
Find the p-value.
(4)
Interpret the results in context.
WELL … there are actually 5 steps, and now that we are
going to quit assuming we know
, we are going to add in that additional step.
This step you can label as STEP 0 because it actually
will occur prior to even stating your hypotheses:
(0)
State and address assumptions!
This new step is CRITICAL for success on the AP
Exam!! Without stating and addressing
assumptions, you CAN NOT score a 4 on that free response question!!
For the inference we’ve examined so far (inference for
the mean), there are 2 assumptions:
(1)
The sample is an SRS.
(2)
The observations from the population have a
normal distribution.
At the start of your significance test, you need to
write:
ASSUMPTIONS:
1. Sample is
an SRS ![]()
2. Distribution of observations is
normal.
Check
by sketching a plot.
Since we cannot know
, the best we can do is use s (the sample standard
deviation) to estimate it … just like you would use
to
estimate
.
THEREFORE:
(the standard
error) is
.
If we use this estimate in calculating the test
statistic, then we get:
Using s introduces more uncertainty – more spread and
variation – the resulting distribution is NOT the standard normal (or Z)
distribution.
The resulting distribution is called a
t-distribution with n – 1 degrees of freedom
You can abbreviate this using the notation t(k), where k
= df = n – 1
**Also referred to sometimes as the Student’s t
distribution.
The t-distribution is actually a family of curves.
The accompanying test statistic is called the
One sample t test statistic = 
The t-distribution is very similar to the z-distribution
in two ways:
(a) Both are symmetric and
centered about 0.
(b) Values along both
distributions are standardized and expressed in units of standard deviations
from the mean.
**The big difference between the two is in the
spread. Look at the figure on p. 618.
The t-distribution, since it has more variation, is
flatter with more observations in the tails.
Notice also that as the df (degrees of freedom) increases
(meaning that the sample size is increasing), the t-distribution becomes more
and more normal … closer and closer to the z-distribution, N(0,1).
If you allow the df to increase without bound (in other
words, allow
, you would get
the z-distribution.
That’s why the z* row of the t-table is
labeled as ∞ … it’s referring to an ever-increasing degree of freedom.
How this will affect our confidence intervals and
hypothesis tests …
We will be using critical values (now t*
instead of z*) and p-values from the t(k) distribution.
For CI’s:

t*
is the upper
critical value.
For Hypothesis Tests: no real difference
Ha: ![]()
Ha: ![]()
Ha: ![]()
Type I error, Type II error, Power:
No real difference
P(Type I) = α
P(Type II) = β
Power = 1 – β
In order to use t-procedures …
(1)
We already know we need an SRS.
(2)
Re: normality, though …
(i)
If n < 15, data needs to be close to normal
and NO OUTLIERS!! (Remember, since
the calculation of t involves
, which is non-resistant, then t also is non-resistant.)
(ii)
If
, OK to use t so long as data distribution isn’t strongly
skewed and there are NO OUTLIERS!!
(iii)
If n > 40, then t is fine regardless of the
distribution.
**Look in your text at (ex) 11.7 on p. 636.
Even though the t-distribution ISN’T resistant to outliers,
it is a ROBUST procedure. This
means that even if you violate the assumptions, the confidence intervals and
p-values are affected very little.
This is due to the fact that the distribution of
is going to be
normal regardless of the data’s distribution (Central Limit Theorem).
Before getting into any examples, let’s look at how the
t-table is different from a z-table.
(1) Values in the body of the
table are the t-scores … NOT the p-values!
(2) df are listed down the side
(3) Area to the right of
the t-values are given (NOT the area to the left, like in the z-table). This is the upper probability value.
*See the bottom of the
table for C.
Now some (ex)’s …
(ex.) 11.2 on p. 622 – 623
(ex.) 11.3 on p. 624-625
Let’s also look at the Minitab output for a t-test on p.
627
Everything we’ve talked about today has involved a single
sample. More often than not, though, you
will use a MATCHED-PAIRS DESIGN instead.
Evidence is more convincing when a comparison or control is involved. (Think back to our 3 principles of good
statistical experimental design:
Randomize, CONTROL, and Replicate.)
A Matched-Pairs design can be …
(ex) 11.4 on p. 629 – 630
Re: Power of a t-test … (ex) 11.8 on
p. 639-640
Assignment:
p. 619 – 620 (11.1 – 11.4)
p. 627 – 628 (11.7 – 11.10)
p. 633 – 635 (11.12 – 11.15)
p. 638 – 639 (11.17, 11.20)
p. 640 – 641 (11.21 – 11.23)