AP Statistics

Chapter 10:  Introduction to Inference

Section 10.1    “Estimating with Confidence”

---

 

To infer means to draw a conclusion.

 

---

 

We’ve been drawing conclusions about populations all year based on sample data,

but in our shift to FORMAL INFERENCE,

we will use probability to take chance variation into account,

correct our judgment by calculation, and express the strength of our conclusion.

 

---

 

There are many elaborate statistical techniques available for making inferences.

 

We will focus on the 2 most common:

                                                                                    (i)         Confidence Intervals

                                                                                    (ii)        Tests of Significance

 

---

 

First & foremost to understand upon entering the area of inferential statistics:

 

Inference is most reliable when the data is obtained from a RANDOM SAMPLE

and from a RANDOMIZED EXPERIMENT.

 

Without the element of randomness, inference cannot be reliable at all.

---

 

*Example 10.2 on p. 537

 

Sample mean  (unbiased estimator) = 461

We use this sample data to informally infer that  because  is unbiased, is “somewhere around 461.”

 

Based on Chapter 9, we know that if we took many repeated samples, the sampling distribution of  

 

(i)                             would be approx. normal.

(ii)                           would have a mean

(iii)                         would have a st. dev. of  

 

Let’s say from long experience, we know that = 100.  That means that =  = 4.5.

**It is not realistic to assume that we know .  We’ll learn how to deal with that fact in the next chapter.  We just use  now to illustrate.

 

We know from the Empirical Rule (The “68-95-99.7” Rule) that in about 95% of all samples,

 will fall within 2 standard deviations of .

This also then means that  will fall within 2 standard deviations of 95% of all sample ’s.

 

                        Since = 461,

                                                                       
                                                                       

We got these interval boundaries by a method that gives correct results 95% of the time.  We can be 95% confident that  lies in that interval.

 

We can state this as:  “We are 95% confident that the mean SAT Math score is between 452 and 470 points.”

 

This interval is called a 95% CONFIDENCE INTERVAL FOR of the form:               estimate  margin of error

The estimate in this case is .

 

The margin of error is based on the variability of the estimate & on the level of confidence we choose.

 

---

 

*Figure 10.4 on p. 541

Only 1 of the 25 samples did not capture  in its interval.

 

---

 

Definition: CONFIDENCE INTERVAL

 

A level C confidence interval for a parameter is an interval computed from sample data
by a method that has probability C of producing an interval containing the value of the parameter.

 

---

 

To be quite sure of our conclusions, a C of .90 or greater (C expressed as a decimal) is often chosen.

 

---

 

Because the sampling distribution of is approximately normal by the Central Limit Theorem,
if we want a 95% confidence level, then that means we want the CENTRAL 95% of the normal sampling distribution.

 

 

The leftover tail area (.05 because area under the curve is 1, and 1 – C = 1 - .95 = .05) represents
those few samples that don’t “capture”  in the interval

estimate  margin of error.”

That leftover area is denoted by , so = .05.

 

---

 

The z values that bound the 95% (or whatever level you choose) can be called a number of things,
depending on what text you use or which statistician you speak to:

 

We will use z* (z star).

You could say zC , or .

 

The positive z-value (z*) is referred to as the upper p critical value of the standard normal distribution.

 

 

 

For a 95% confidence level, we need to find z* … although z = 2 from the Empirical Rule,
we can get a more accurate estimate from the z-table, the calculator, or from Table C in the back of your book.

 

Z* = 1.96

 

2nd VARS for DISTR

3:Invnorm

.975

ENTER

 

---

 

Suppose we want a 90% confidence level.  Find z*.

 

Z* = 1.645

---

 

Once the confidence level has been chosen, you can now compute
the CONFIDENCE INTERVAL FOR A POPULATION MEAN:

Draw an SRS of size n from a population having unknown mean  and known standard deviation  .

A level C confidence interval for  is .

 

(That’s estimate  margin of error.)

 

*Think about how this makes sense based on our first example of SAT scores.

---

 

Steps for Creating a Confidence Interval:

 

(1)                    Identify population and parameter of interest.

 

(2)                    Assess normality (ex: stem plot of sample data)

 

(3)                    Conduct inference (i.e. construct confidence interval)

 

(4)                    Interpret results

 

---

 

 

To find a confidence interval using the graphing calculator:

 

STAT

TESTS

7:ZInterval

 

 

(EX)10.5 on p. 546

 

 

 

LARGER SAMPLES give SMALLER

INTERVALS and therefore SMALLER

MARGINS OF ERROR!

 

---

What else can reduce the margin of error?

Think of fraction behavior:

 

 

---

 

LASTLY, SOME CAUTIONS …

 

(1)        Data MUST come from an SRS.

 

(2)        The formulas we’ve discussed in this section apply ONLY to SRS’s.  More complex formulas are needed for other types of samples (stratified, clustered, etc.).

 

(3)        Inference can not be made from haphazardly collected data with bias of unknown size.

 

(4)        Because is nonresistant to extreme measures, the CI (confidence interval) can also be strongly influenced.  If outlier values can be corrected or removed, do so before calculating the CI.   (And if you do correct or remove one or more measure, then you MUST indicate that in your work and give reasonable rationale for doing so.  Do not just remove extreme measures because “it messes up what your data was supposed to look like!”)
                                   

(5)        Since the interval is based on the sampling distribution of , the distribution of the population does not greatly disturb the CI so long as n ≥ 15 for non-normal distributions. 

 

(6)        Understand that the CI margin of error ONLY covers random sampling errors!!  (It cannot account for other bias errors such as nonresponse or undercoverage, for example.)