Basic Statistics with Excel

 

1. Descriptive Statistics

 

Use Averages to summarize your data

 

Suppose you’ve done your experiment and you have data on 400 people. Are you going to make a graph with 400 sets of lines on it? Are you going to make a table with 400 rows of data?

 

Of course not! (HINT)

 

You’re going to summarize your data by giving the AVERAGE values.

 

To practice, use the data set suggested by your instructor or the file ‘BP Data’ which is on all the school computers in the BI 231 folder. These instructions assume you are using the 'BP Data' file.

 

1. Descriptive Statistics

 

Averages

Choose the sheet labeled ‘Descriptive Statistics Practice,’ which presents height and systolic blood pressure data for male and female children from 1-17 years of age. What is the average blood pressure for the male children?

 

1. Find the column with the male children’s systolic blood pressure values.

2. Click on the empty cell below the last number in the column

3. Hit the = sign in the bar at the top of the spreadsheet

4. Hit the down arrow to the left of the = sign. Now some options will appear.

5. Click on “AVERAGE”

6. If it is not on the list, click “more functions,” choose “statistical,” then “AVERAGE”

7. A dialog box will appear, asking you what cells you want the average taken over. Doublecheck to make sure they are the cells you want averaged, then click OK.

8. The average will appear! You may forget what it is if you don’t label it, so click on the box just before it and type ‘average’ to remind yourself.

 

When you report an average in your paper, you should also report the number of people you took the average over. It is abbreviated ‘n’. You would write: “The average systolic blood pressure for male children was 119 mm Hg (n=17).” In your data table, you would have a column for the n value.

 

Sex

n

Average Systolic Blood Pressure (mm Hg)

Male

17

119

 

 

Now, what is the average systolic blood pressure for the female children?

 

 

What is the average systolic blood pressure for children 1 year old? This one is a little trickier, because you only want to average two numbers. To do it:

 

1. Click on the empty cell at the end of the row containing the data

3. Hit the = sign in the bar at the top of the spreadsheet

4. Hit the down arrow to the left of the = sign. Now some options will appear.

5. Click on “AVERAGE”

6. If it is not on the list, click “more functions,” choose “statistical,” then “AVERAGE”

7. A dialog box will appear, asking you what cells you want the average taken over. It will give you the wrong cells, in this case! So correct it to contain the two cells you want to average – in this case they are D2,I2. Notice that there is a comma between them, not a colon.

8. Once again, you may forget what the figure in this box means, so you might want to type ‘average’ in the column heading.

 

You would express this in your paper by writing, “The average systolic blood pressure for one-year-olds was 103 mm Hg (n=2).”

 

Now calculate the average blood pressures for all ages. You can do this by clicking on the cell with the average figure for one-year-olds. A border will appear around the cell. Put the cursor on the lower right corner of the border, hold the mouse button down, and drag downwards to fill the column. This will copy the equation you just created into all the other cells in the column.

 

Graph the Average Values to make your data presentation simpler.

 

Create a chart comparing the average blood pressures of male and female children.

 

1. This chart will only contain two values. To select them, click on one value and then hold the  ‘Ctrl’ key down (open apple key on a mac) and click on the second value.

2. When you’ve selected both cells, click on the chart icon on the toolbar.

3. Make a column chart and label its axes. Save it on a new sheet.

4. You’ll see a big problem with this chart – it only shows blood pressure values starting at 116 mm Hg. Doesn’t it look as if boys and girls are very different? Charts that do this are regarded as dishonest, so you want to fix it right away. To do this, doubleclick on the ‘116’ value on the chart.

5. A dialog box will appear, asking you how you want to change that axis. Choose ‘scale.’

6. Set the minimum at 0 and click OK.

 

To label your chart, go to the Chart Menu and select ‘Chart Options’.

 

Is the average good enough? Adding Standard Deviations

 

Which average is more accurate – the average value for male children’s blood pressure, or the average value for one-year-olds’ blood pressure? If you look at the data, you see that the average for male children is a lot less accurate. Several of the children have blood pressures that are quite different from the average. So if you tell your readers the average value, you might be misleading them. How can you give them a better idea of the data, without making them read the whole data set?

 

You do this by giving not only the average but the STANDARD DEVIATION.

 

To calculate the standard deviation, follow the same set of steps you did for the average, but choose STDEV instead of AVERAGE.

 

Calculate the standard deviation for the blood pressures of male and female children and for the blood pressures of children from ages 1 through 17. Be careful to check which cells the computer is calculating the data for, so it doesn’t include your averages with the data.

 

You’ll find that the standard deviation for the male children’s blood pressure is much higher than the standard deviation for one-year-olds’ blood pressure. That means you can trust the average for one-year-olds more than the average for male children.

To present these data to your readers, you would write:

 

“The systolic blood pressure of male children from ages one to 17 averaged 119 mm Hg (n = 17; S.D. = 9.77 mm Hg).”

 

In your data tables, you should give the standard deviation any time you give an average. You do this by giving the value for the average, plus or minus the standard deviation.

 

Sex

n

Average Systolic Blood Pressure (mm Hg)

Male

17

119  +   9.77

 

Your graph should also show the standard deviations. How do you do this?

After you have created your graph with the two bars representing the average blood pressures, doubleclick on one of the bars and choose "Y error bars" from the menu. You'll see:

Choose 'Both' at the top. Now to get the value you'll use for your error bars, choose 'Custom' and click on the little image of the spreadsheet at the right end of the space for entering a '+' value. Now go to your spreadsheet and select ALL THE STANDARD DEVIATION VALUES. If you only select one, that one will be the error bar for all your values; you must select the standard deviations for all the values you want error bars on. Do the same for the '-' value. Then click 'OK'. Your graph will now have error bars!

 

2. Comparative Statistics

 

Statistics aren’t just used to make your data presentation simpler. They’re used to tell whether the difference between two sets of data is significant or not. That is, was it a real difference or just due to chance?

 

To practice with these, go to the second sheet in the BP Data file – the sheet labeled ‘T-test practice data.’ The question you want to answer is, Did treatment A increase blood pressure, or were these changes just due to chance?

 

Looking at the raw data, you see that many of the values increased, but some went down. What happens if you compare the average values? Calculate the average values and standard deviations for initial blood pressure and blood pressure after treatment A, and graph them. Don’t forget to label the averages on your spreadsheet, and to adjust the graph axis to start at 0.

 

It’s not clear whether there was a real change, is there? To find out, we’ll have to use some more sophisticated statistics and do a T-test.

 

A T-test compares two sets of values and gives you the probability that they are just due to chance. Ideally, you want a very low probability – a value less than 0.05. To do a T-test and compare these sets of blood pressures:

 

1. Click on any empty cell

2. Hit the = sign in the bar at the top of the spreadsheet

3. Hit the down arrow to the left of the = sign. Now some options will appear.

4. Click on “TTEST”

6. If it is not on the list, click “more functions,” choose “statistical,” then “TTEST”

7. A dialog box will appear. Click in the box next to “Array 1.”

8. Drag the dialog box out of the way, then highlight your first column of numbers.

9. Click in the box next to “Array 2” and highlight your second column of numbers.

 

To answer the ‘tails’ question, remember your prediction about the two groups. If you predicted that one would be higher than the other, choose 1 tail. If you just predicted that they would be different, pick 2 tails. You can’t change your mind once the data are gathered.

 

To choose the ‘type’ of T test, look at your data. In this case, you have two blood pressure values from each subject. You want to compare them to each other. That is, you don’t want to compare one person’s initial value with somebody else’s final value! So your data are “paired,” so choose option 1.

 

You would use an unpaired T test if you were comparing two unrelated sets of data – for instance, heart rate in snowboarders versus skateboarders. 

 

Now hit ‘OK’ and see what the number is. This is your p-value, the probability that the difference is just due to chance alone. A p-value of 0.05 or less is generally considered statistically significant, and one of 0.05 or greater indicates no real difference between the groups.

 

In your paper, you would express these data by writing: “Although the average systolic blood pressure increased by 6.4 mm Hg after treatment A, the difference was not statistically significant (p= 0.125).”

 

Now compare the initial systolic blood pressures with the systolic blood pressures after treatment B, using a T-test. Does treatment B cause a statistically significant change in blood pressure? How would you express this?

 

TURN IN :

 

1. A sentence that tells what effect treatment B had on systolic blood pressure, and whether it was statistically significant or not.

 

2. A data table showing the average blood pressures, standard deviations and number of subjects for the initial measurements and the two treatments, with a title and caption that explains which treatment caused a statistically significant change and what the p-values were for the two treatments.

 

3. A graph showing the initial systolic blood pressure values and the values after treatments A and B, with a title that explains which treatment caused a statistically significant change and what the p-values were for the two treatments.