# Assignment 4 T Tests On Excel

Printer-friendly version
• Two Methods for Making a Statistical Decision
• Steps in a Conducting a Hypothesis Test
• Rejection Region Approach to Hypothesis Testing for One Proportion Problem
• P-value Approach to Hypothesis Testing
• Comparing the P-Value Approach to the Rejection Region Approach

An Introduction to Statistical Methods and Data Analysis, (See Course Schedule).

### Two Methods for Making a Statistical Decision

There are two approaches for making a statistical decision regarding a null hypothesis.  One is the rejection region approach and the second is the p-value (or probability value) approach.  Of the two methods, the latter is more commonly used and provided in published literature.  However, understanding the rejection region approach can go a long way in one's understanding of the p-value method.  Regardless of method applied, the conclusions from the two approaches are exactly the same.  In explaining these processes in this section of the lesson, we will build upon the prior steps already discussed (i.e. setting up hypotheses, stating the level of significance α, and calculating the appropriate test statistic).

Let's start out here by having Dr. Wiesner walk through a comparison of the p-value approach with the rejection region approach to hypothesis testing.

Test statistic: The sample statistic one uses to either reject Ho (and conclude Ha) or not to reject Ho.

Critical values: The values of the test statistic that separate the rejection and non-rejection regions.

Rejection region: the set of values for the test statistic that leads to rejection of Ho.

Non-rejection region: the set of values not in the rejection region that leads to non-rejection of Ho.

P-value: The p-value (or probability value) is the probability that the test statistic equals the observed value or a more extreme value under the assumption that the null hypthothesis is true.

As mentioned previously in this lesson, the logic of hypothesis testing is to reject the null hypothesis if the sample data are not consistent with the null hypothesis. Thus, one rejects the null hypothesis if the observed test statistic is more extreme in the direction of the alternative hypothesis than one can tolerate. The critical values are the boundary values obtained corresponding to the preset α level.

### Steps in a Conducting a Hypothesis Test

Although we listed these at the beginning of the lesson, we reiterate them here for convenience plus we are building on them.

Step 1. Check the conditions necessary to run the selected test and select the hypotheses for that test.:

1. If Z-test for one proportion: $$np_0 \geq 5$$ and $$n(1 - p_0) \geq 5$$
2. If a t-test for one mean: either the data comes from an approximately normal distribution or the sample size is at least 30.  If neither, then the data is not heavily skewed and without outliers.

If One Proportion Z-test:

 Two-tailed Right-tailed Left-tailed $$H_0 : p = p_0$$ OR $$H_0 : p = p_0$$ OR $$H_0 : p = p_0$$ $$H_a : p \ne p_0$$ $$H_a : p > p_0$$ $$H_a : p< p_0$$

If One Mean t-test

 Two-tailed Right-tailed Left-tailed $$H_0 : \mu = \mu_0$$ OR $$H_0 : \mu = \mu_0$$ OR $$H_0 : \mu = \mu_0$$ $$H_a : \mu \ne \mu_0$$ $$H_a : \mu > \mu_0$$ $$H_a : \mu < \mu_0$$

Step 2. Decide on the significance level, $$\alpha$$.

Step 3. Compute the value of the test statistic:

If One Proportion Z-test: $$Z^{*}=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$$

If One Mean t-test: $$t^{*} = \frac{\bar{x}-\mu_0}{S/\sqrt{n}}$$

### Rejection Region Approach to Hypothesis Testing

Step 4. Find the appropriate critical values for the tests using the Z-table for test of one proportion, or the t-table if a test for one mean.  REMEMBER: for the one mean test the degrees for freedom are the sample size minus one (i.e. n - 1). Write down clearly the rejection region for the problem.

 One Proportion Z-test One Mean t-test Two-TailedReject $$H_0$$ if $$|Z^*| \geq Z_{\alpha/2}$$ Two-TailedReject $$H_0$$ if $$|t^*| \geq t_{\alpha/2}$$ Left-TailedReject $$H_0$$ if $$Z^* \leq Z_{\alpha}$$ Left-TailedReject $$H_0$$ if $$t^* \leq t_{\alpha}$$ Right-TailedReject $$H_0$$ if $$Z^* \geq Z_{\alpha}$$ Right-TailedReject $$H_0$$ if $$t^* \geq t_{\alpha}$$

Step 5. Check to see if the value of the test statistic falls in the rejection region. If it does, then reject $$H_0$$ (and conclude $$H_a$$). If it does not fall in the rejection region, do not reject $$H_0$$.

Step 6. State the conclusion in words.

### P-value Approach to Hypothesis Testing

Steps 1- Step 3. The first few steps (Step 0 - Step 3) are exactly the same as the rejection region approach.

Step 4. In Step 4, we need to compute the appropriate p-value based on our alternative hypothesis:

If $$H_a$$ is right-tailed, then the p-value is the probability the sample data produces a value equal to or greater than the observed test statistic.

If $$H_a$$ is left-tailed, then the p-value is the probability the sample data produces a value equal to or less than the observed test statistic.

If $$H_a$$ is two-tailed, then the p-value is two times the probability the sample data produces a value equal to or greater than the absolute value of the observed test statistic.

 Right-tailed Left-tailed Two-tailed $$P(Z > Z*)$$ OR $$P(Z < Z*)$$ OR $$2 \times P(Z > |Z*|)$$ $$P(t > t*)$$ at df = n-1 $$P(t < t*)$$ at df = n-1 $$2 \times P(t > |t*|)$$ at df = n-1

Step 5. Check to see if the p-value is less than the stated alpha value.  If it is, then reject $$H_0$$ (and conclude $$H_a$$). If it is not less than alpha, do not reject $$H_0$$.

Step 6. Conclusion in words.

Here is Dr. Wiesner working through an example that will help you understand what p-value is:

### Example:  Penn State Students from Pennsylvania

Continuing with our one-proportion example at the beginning of this lesson,  say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5 at a 5% level of significance?

#### A: Using the Rejection Region Approach

Step 1. Can we use the one-proportion z-test?

The answer is yes since the hypothesized value $$p_0$$ is 0.5 and we can check that:

$$np_0 = 500 \times 0.5 = 250 \geq 5$$,
$$n(1 - p_0) = 500 \times (1 - 0.5) = 250 \geq 5$$.

Set up the hypotheses.  Since the research hypothesis is to check whether the proportion is greater than 0.5 we set it up as a one(right)-tailed test:

$$H_0: p = 0.5$$
$$H_a: p > 0.5$$

Step 2. Decide on the significance level, $$\alpha$$.

According to the question, $$\alpha$$ = 0.05.

Step 3. Compute the value of the test statistic:

\begin{align} Z^{*} &= \frac{\hat{p}-p_0}{\sqrt{\frac{p_0 (1-p_0)}{n}}}\\ &=\frac{0.556-0.5}{\sqrt{\frac{0.5 \cdot (1-0.5)}{500}}}\\ &=2.504\\ \end{align}

Step 4. Find the appropriate critical values for the test using the z-table. Write down clearly the rejection region for the problem. We can use the standard normal table or the last row of our t-table to find the value of Z0.05 since that last row for df = $$\infty$$ (infinite) refers to the z-value.

From the table, $$Z_0.05$$ is found to be 1.645 and thus the critical value is 1.645. The rejection region for the right-tailed test is given by:

$$Z* > 1.645$$

Step 5. Check whether the value of the test statistic falls in the rejection region. If it does, then reject $$H_0$$ (and conclude $$H_a$$). If it does not fall in the rejection region, do not reject $$H_0$$.

The observed Z-value is 2.504 - this is our test statistic.  Since Z* falls within the rejection region, we reject $$H_0$$.

Step 6. State the conclusion in words.

With a test statistic of 2.504 and critical value of 1.645 at a 5% level of significance, we have enough statistical evidence to reject the null hypothesis.  We conclude that a majority of the students are from Pennsylvania.

#### B: Using the P-value Approach

Steps 1- Step 3. The first few steps (Step 1 - Step 3) are exactly the same as the rejection region approach.

Step 4. In Step 4, we need to compute the appropriate p-value based on our alternative hypothesis.  With our alternative hypothesis being right-tailed:

\begin{align}\\ p-value &= P(Z > Z^{*})\\ &= P \left( Z > \left|\frac {\hat{p}-p_0}{\sqrt{\frac{p_0 (1-p_0)}{n}}}\right| \right) \\ &= P \left( Z > \left|\frac{0.556-0.5}{\sqrt{\frac{0.5(1-0.5)}{500}}}\right| \right) \\ &= P(Z > 2.50)\\ &=0.0062\\ \end{align}

Step 5. Since p-value = 0.0062 < 0.05 (the α value), we reject the null hypothesis.

Step 6. Conclusion in words:

With a test statistic of 2.504 and p-value of 0.0062, we reject the null hypothesis at a 5% level of significance.  We conclude that a majority of the students are from Pennsylvania.

### Example: Length of Lumber

Continuing with our one mean lumber example from the beginning of this lesson, the mean length of the lumber is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet. If the builder observes that the sample mean of 61 pieces of lumber is 8.3 feet with a sample standard deviation of 1.2 feet. What will she conclude?  Conduct this test at a 1% level of significance.

#### A: Using the Rejection Region Approach

Step 1. Can we use the one-mean t-test?

The answer is yes since the sample size of 61 is sufficiently large (greater than 30):

Set up the hypotheses (since the research hypothesis is to check whether the proportion is different from 0.25, we set it up as a two-tailed test):

$$H_0: \mu = 8.5$$
$$H_a: \mu \ne 8.5$$

Step 2. Decide on the significance level, $$\alpha$$.

According to the question, $$\alpha$$ = 0.01.

Step 3. Compute the value of the test statistic:

$t^{*} = \frac{\bar{x}-\mu_0}{S/\sqrt{n}}=\frac{8.3-8.5}{1.2/\sqrt{61}}=-1.3$

Step 4. Find the appropriate critical values for the test using the t-table. Write down clearly the rejection region for the problem.

From the table and with degrees of freedom of 60 from 61 - 1, that the critical value at $$t_{\alpha/2} = t_{0.005}$$ is found to be 2.660 and thus the critical value 2.660. The rejection region for the two-tailed test is given by:

$$t* < - 2.660$$,  or $$t* > 2.660$$

Step 5. Check whether the value of the test statistic falls in the rejection region. If it does, then reject $$H_0$$ (and conclude $$H_a$$). If it does not fall in the rejection region, do not reject $$H_0$$.

The observed t-value is -1.3 - this is our test statistic.  Since t* does not fall within the rejection region, we fail to reject $$H_0$$.

Step 6. State the conclusion in words.

With a test statistic of -1.3 and critical value of ± 2.660 at a 1% level of significance, we do not have enough statistical evidence to reject the null hypothesis.  We conclude that there is not enough statistical evidence that indicates that the mean length of lumber differs from 8.5 feet.

#### B: Using the P-value Approach

Steps 1- Step 3. The first few steps (Step 1 - Step 3) are exactly the same as the rejection region approach.

Step 4. In Step 4, we need to compute the appropriate p-value based on our alternative hypothesis:

\begin{align}p-value &= 2\times P(t > |t^{*}|) \\ &= 2\times P \left(t > \left|\frac{\bar{x}-\mu_0}{S/\sqrt{n}}\right| \right) \\ &= 2\times P \left(t > \left|\frac{8.3-8.5}{1.2/\sqrt{61}}\right| \right) \\ &= 2\times P(t > |-1.3|) \\ &= 2\times P(t > 1.3)\\ \end{align}

Step 5. From the t-table going across the row for 60 degrees of freedom, we do not find a value equal to 1.3. Without software to find a more exact probability, the best we can do from the t-table is find a range.  We do see that the value falls between 1.296 and 1.671.  These two t-values correspond to right-tail probabilies of 0.1 and 0.05, respectively.  Since 1.3 is between these two t-values, then it stands to reason that the probability to the right of 1.3 would fall between 0.05 and 0.1.  Therefore, the p-value would be = $$2\times (0.05)$$ and $$0.1)$$ or from 0.1 to 0.2  With this range of possible p-values exceeding our 1% level of signficance for the test, we fail to reject the null hypothesis.

Step 6. Conclusion in words:

With a test statistic of - 1.3 and p-value between 0.1 to 0.2, we fail to reject the null hypothesis at a 1% level of significance since the p-value would exceed our significance level.  We conclude that there is not enough statistical evidence that indicates that the mean length of lumber differs from 8.5 feet.

### Comparing the P-Value Approach to the Rejection Region Approach

Both approaches will ensure the same conclusion and either one will work. However, using the p-value approach has the following advantages:

1. Using the rejection region approach, you need to check the table for the critical value every time people give you a different α value.
2. In addition to just using it to reject or not reject $$H_0$$ by comparing p-value to α value, p-value also gives us some idea of the strength of the evidence against $$H_0$$.

In paired sample hypothesis testing, a sample from the population is chosen and two measurements for each element in the sample are taken. Each set of measurements is considered a sample. Unlike the hypothesis testing studied so far, the two samples are not independent of one another. Paired samples are also called matched samples or repeated measures.

For example, if you want to determine whether drinking a glass of wine or drinking a glass of beer has the same or different impact on memory, one approach is to take a sample of say 40 people, and have half of them drink a glass of wine and the other half drink a glass of beer, and then give each of the 40 people a memory test and compare results. This is the approach with independent samples.

Another approach is to take a sample of 20 people and have each person drink a glass of wine and take a memory test, and then have the same people drink a glass of beer and again take a memory test; finally we compare the results. This is the approach used with paired samples.

The advantage of this second approach is the sample can be smaller. Also since the sampled subjects are the same for beer and wine there is less chance that some external factor (confounding variable) will influence the result. The problem with this approach is that it is possible that the results of the second memory test will be lower simply because the person has imbibed more alcohol. This can be corrected by sufficiently separating the tests, e.g. by conducting the test with beer a day after the test with wine.

It is also possible that the order in which people take the tests influences the result (e.g. the subjects learn something on the first test that helps them on the second test, or perhaps taking the test the second time introduces a degree of boredom that lowers the score). One way to address these order effects is to have half the people drink wine on day 1 and beer on day 2, while for the other half the order is reversed (called counterbalancing).

The following table summarizes the advantages of paired samples versus independent samples:

 Paired Samples Independent Samples Need fewer participants Fewer problems with fatigue or practice effects Greater control over confounding variables Participants are less likely to figure out the purpose of the study

Figure 1 – Comparison of independent and paired samples

Obviously not all experiments can use the paired sample design. E.g. if you are testing differences between men and women, then independent samples will be necessary.

As you will see from the next example, the analysis of paired samples is made by looking at the difference between the two measurements. As a result, this case uses the same techniques as for the one sample case, although a type 1 TTEST or the paired sample data analysis tool can also be used.

Example 1: A clinic provides a program to help their clients lose weight and asks a consumer agency to investigate the effectiveness of the program. The agency takes a sample of 15 people, weighing each person in the sample before the program begins and 3 months later to produce the table in Figure 2. Determine whether the program is effective.

Figure 2 – Data for paired sample example

Let x = the difference in weight 3 months after the program starts. The null hypothesis is:

H0: μ = 0; i.e. any differences in weight is due to chance

We can make the following calculations using the difference column D:

s.e. = std dev / = 6.33 / = 1.6343534

tobs = ( – μ) /s.e. = (10.93 – 0) /1.63 = 6.6896995

tcrit = TINV(α, df) = TINV(.05, 14) = 2.1447867

Since tobs > tcritwe reject the null hypothesis and conclude with 95% confidence that the difference in weight before and after the program is not due solely to chance.

Alternatively we can use a type 1 TTEST to perform the analysis as follows:

p-value = TTEST(B4:B18, C4:C18, 2, 1) = 1.028E-05 < .05 = α

and so once again we reject the null hypothesis.

As usual, for the results to be valid, we need to make sure that the assumptions for the t-test hold, namely that the difference measures are normally distributed or at least reasonably symmetric. From Figure 3 we see that this is the case:

Figure 3 – Box Plot for difference measures (column D of Figure 2)

We can also use either Excel’s t-Test: Paired Two Sample for Means data analysis tool or the T Test and Non-parametric Equivalents supplemental data analysis tool to get the same result. The output from the Excel data analysis tool is shown in Figure 4.

Figure 4 – Excel data analysis for paired samples

To use the data analysis version found in the Real Statistics Resource Pack, enter Ctrl-m and select T Tests and Non-parametric Equivalents from the menu. A dialog box will appear (as in Figure 3 of Two Sample t Test: Unequal Variances). Enter the input range B3:C18 and choose the Column headings included with the data, Paired Samples and T Test options and press the OK button. The output is shown in Figure 5.

Figure 5 – Real Statistics data analysis for paired samples

We have seen all the items in the above table before with the exception of the Pearson Correlation. This is explored in Correlation.

Observation: Suppose we run the same analysis for the data in Example 1 from Two Sample t Test with Equal Variances using the t-test with independent samples and compare the results with those we obtained for paired samples:

Figure 6 – Excel data analysis for independent samples

We summarize the results from the two analyses as follows:

Figure 7 – Comparison of paired and independent sample t tests

Note that the mean differences are the same, but the standard deviation for the paired sample case is lower, which results in a higher t-stat and a lower p-value. This is generally true.

Observation: Although we have provided a supplemental data analysis tool for one sample tests, Excel doesn’t provide a standard data analysis tool for this case. The type 1 TTEST and paired samples data analysis tool can, however, be used for the one sample case by simply creating a null paired sample with all zero data.

Example 2: Repeat Example 1 of One Sample t-Test using the above observation.

Figure 8 – Use of paired sample data analysis for one sample test

Observation: Since the two sample paired data case is equivalent to the one sample case, we can use the same approaches for calculating effect size and power as we used in One Sample t Test. In particular, Cohen’s effect size is

where z = x1 x2. There are other version of Cohen’s effect size, including drm and dav. These are described at Cohen’s d for Paired Samples.