Chapter 10: Introduction to Inference
Section 10.1 “Estimating with Confidence”
To infer means to draw a conclusion.
We’ve
been drawing conclusions about populations all year based on sample data,
but in our
shift to FORMAL INFERENCE,
we will use
probability to take chance variation into account,
correct our
judgment by calculation, and express the strength of our conclusion.
There are many elaborate statistical
techniques available for making inferences.
We will focus on the 2 most common:
(i) Confidence
Intervals
(ii) Tests of Significance
First
& foremost to understand upon entering the area of inferential statistics:
Inference is most reliable when the data is
obtained from a RANDOM SAMPLE
and from a RANDOMIZED EXPERIMENT.
Without the element of randomness, inference cannot be
reliable at all.
*Example 10.2 on p. 537
Sample mean
(unbiased estimator) =
461
We use this sample
data to informally infer that
because
is unbiased,
is “somewhere around 461.”
Based on Chapter 9,
we know that if we took many repeated samples, the sampling distribution of
…
(i)
would be approx.
normal.
(ii)
would
have a mean ![]()
(iii)
would
have a st. dev. of
Let’s say from long
experience, we know that
= 100. That means that
=
= 4.5.
**It is not
realistic to assume that we know
. We’ll learn how to
deal with that fact in the next chapter.
We just use
now to illustrate.
We know from the
Empirical Rule (The “68-95-99.7” Rule) that in about 95% of all samples,
will
fall within 2 standard deviations of
.
This also then means
that
will fall within 2
standard deviations of 95% of all sample
’s.
Since
= 461,
![]()
![]()
We got these
interval boundaries by a method that gives correct results 95% of the
time. We can be 95% confident that
lies in that interval.
We can state this
as: “We are 95% confident that the
mean SAT Math score is between 452 and 470 points.”
This interval is
called a 95% CONFIDENCE INTERVAL FOR
of the form: estimate
margin of error
The estimate in this
case is
.
The margin of error
is based on the variability of the estimate & on the level of confidence we
choose.
*Figure 10.4 on p.
541
Only 1 of the 25
samples did not capture
in its interval.
Definition: CONFIDENCE INTERVAL
A level C confidence interval for a parameter
is an interval computed from sample data
by a method that has probability C of producing an interval containing the
value of the parameter.
To be quite sure of our conclusions, a C of
.90 or greater (C expressed as a decimal) is often chosen.
Because the sampling distribution of
is approximately normal by the Central Limit Theorem,
if we want a 95% confidence level, then that means we want the CENTRAL 95% of
the normal sampling distribution.
The leftover tail area (.05 because area
under the curve is 1, and 1 – C = 1 - .95 = .05) represents
those few samples that don’t “capture”
in the interval
“estimate
margin of error.”
That leftover area is denoted by
, so
= .05.
The z values that bound the 95% (or whatever
level you choose) can be called a number of things,
depending on what text you use or which statistician you speak to:
We will use z* (z star).
You could say zC ,
or
.
The positive z-value (z*) is
referred to as the upper p critical value of the standard normal
distribution.
For a 95% confidence level, we need to find z*
… although z = 2 from the Empirical Rule,
we can get a more accurate estimate from the z-table, the calculator, or from
Table C in the back of your book.
Z* = 1.96
2nd VARS for DISTR
3:Invnorm
.975
ENTER
Suppose we want a 90% confidence level. Find z*.
Z* = 1.645
Once the confidence level has been chosen,
you can now compute
the CONFIDENCE INTERVAL FOR A POPULATION MEAN:
Draw an SRS of size n from a population
having unknown mean
and known standard deviation
.
A level C confidence interval for
is
.
(That’s estimate
margin of error.)
*Think about how this makes sense based on
our first example of SAT scores.
Steps for Creating a
Confidence Interval:
(1) Identify
population and parameter of interest.
(2) Assess
normality (ex: stem plot of sample data)
(3) Conduct
inference (i.e. construct confidence interval)
(4) Interpret
results
To find a confidence interval using the
graphing calculator:
STAT
TESTS
7:ZInterval
(EX)10.5 on p. 546
LARGER SAMPLES give
SMALLER
INTERVALS and
therefore SMALLER
MARGINS OF ERROR!
What else can
reduce the margin of error?
Think of fraction
behavior:
LASTLY, SOME
CAUTIONS …
(1) Data MUST
come from an SRS.
(2) The
formulas we’ve discussed in this section apply ONLY to SRS’s. More complex formulas are needed for other
types of samples (stratified, clustered, etc.).
(3) Inference
can not be made from haphazardly collected data with bias of unknown size.
(4) Because
is nonresistant to extreme measures, the CI
(confidence interval) can also be strongly influenced. If outlier values can be corrected or
removed, do so before calculating the CI.
(And if you do correct or remove one or more measure, then you MUST
indicate that in your work and give reasonable rationale for doing so. Do not just remove extreme measures because “it messes up what your data was supposed to
look like!”)
(5) Since the
interval is based on the sampling distribution of
, the distribution of the population does not greatly disturb
the CI so long as n ≥ 15 for non-normal distributions.
(6) Understand
that the CI margin of error ONLY covers random sampling errors!! (It cannot account for other bias errors such
as nonresponse or undercoverage, for example.)