| Neville Hunt's Homepage |
| Teaching Statistical Concepts Using Spreadsheets |
by Neville Hunt,
Coventry UniversityAcknowledgementThis article appeared in the Proceedings of the 1995 Conference of the Association of Statistics Lecturers in Universities, published by the Teaching Statistics Trust. IntroductionRarely is anything learned from student feedback questionnaires. As someone once said, there are lies, damned lies and students' feedback. By a law of averages, for every student who describes your classes as heaven on earth there will be another for whom they were a taste of purgatory. It was with some surprise therefore that the author observed virtual unanimity in the feedback from a class of first year undergraduate economists regarding their course in elementary statistics. Questionnaire after questionnaire complained "Why do we have to learn Minitab?". The question was frightening. Did they really want to do their statistics by hand calculation? Surely they did not want to be taught SPSS - and the lecturer rewrite all his course materials? One questionnaire told it straight - "Why can't we use a spreadsheet instead of Minitab?". It has taken some time for the author to appreciate the wisdom of this suggestion. In this paper we consider just what a spreadsheet has to offer students on a first course in statistics. Spreadsheets are natural Twenty years ago, long before the spreadsheet age, the
author was taught to calculate a standard deviation by constructing a
table with columns x, x-
Similarly in least squares regression the natural approach is for students to fit a line to their data, calculate the residual sum of squares, move the line, recalculate, and so on until the minimum is found. Natural though this may be, it is totally impracticable by hand. With a spreadsheet it is simple, as Kowszun (1988) demonstrates. Figure 2 shows a possible layout. What is more, the least squares criterion can be compared with other criteria such as making the residuals sum to zero, or minimising the sum of absolute residuals. The effects of transformations can also be investigated (see Lageard, 1988).
Spreadsheets are transparent So many computer packages are "black boxes" into which students feed data and out of which pours printout by the sackful. A beauty of the spreadsheet is that it can be made almost totally transparent to the user. In the standard deviation spreadsheet in Figure 1 a student can see from the labelling how each quantity is calculated. In a more complicated spreadsheet this may not be so clear. However, by moving the cursor to highlight a particular cell, the user can view the formula by which the cell is calculated. For example, in Figure 3 the formula stored in cell H4 might be written as =D6/SQRT(D5). This can be facilitated further by giving appropriate names to cells or ranges of cells. For example, if D6 is named sigma and D5 is named n, the formula in C9 appears as =sigma/SQRT(n) on the formula bar. Similarly, in Figure 2 the array D3:D8 has been named x which, with cell D4 and B7 named a and b respectively, allows the array formula {=a+b*x} to be entered in F3:F8.
This transparency helps to overcome problems of interpretation associated with computer output. As Tamura (1994) points out, in the context of regression, students can be bewildered by the number of different quantities appearing. Few teachers dare to expect students to learn formulae. The advantage of the spreadsheet is that it does the calculating for them, yet allows them to see the formula being used. Spreadsheets are interactive Whereas many students hesitate to touch a statistical package for fear of the consequences, a spreadsheet clamours to be meddled with. Every true student wants to ask "What if ...?" questions, but often dares not for fear of appearing stupid in front of their peers. Take the spreadsheet in Figure 4 for example. What if I ask for a 0% confidence interval? What if I make n=1000000? What if I make sigma=0? An increasing number of students appear to lack the mathematical skills to answer such questions simply by examining the formulae involved. For such students the spreadsheet provides a powerful investigative tool. One of the aids to investigation is the spreadsheet environment itself. Students can record the results of their experiments elsewhere in the spreadsheet and plot appropriate graphs. For example, using the spreadsheet in Figure 4 a student could record (and subsequently plot) the effect on the margin of error due to changes in n.
Spreadsheets are dynamic The dynamic graphics capability is an important element in investigative work. For example, in Figure 2 if the value of a or b is changed the line moves and all other spreadsheet quantities update immediately. The Binomial distribution can be brought to life by allowing students to plot it for different values of n and p. Similarly, students can get a feel for important continuous distributions such as Normal, Chi-square, F, t and Weibull by observing the effects of altering their parameters. The spreadsheet also offers an ideal environment for simulation experiments. Figure 5 shows a simulation of the National Lottery. The student can select the numbers to put on the ticket, then watch as the lottery draw is made to see how many numbers were correct. Further draws can be made ad infinitum and the student can watch, usually spellbound, to see how infrequently a prize is won! In more complex situations simple transformations of standard rectangular (0,1) random data can be used to simulate the exponential and normal variables which are the essential building blocks. A demonstration of the Central Limit Theorem is also possible.
Spreadsheets are adaptable One of the disadvantages of most statistical packages is the lack of control which the user has over the content and style of the output. The transparency of the spreadsheet means that it can easily be modified by the teacher to conform to a particular teaching strategy. Figure 3 shows the author’s layout for simple linear regression analysis. All the important cells are named in accordance with the author’s preferred notation and the spreadsheet incorporates most of the information normally provided by a statistical package. Off screen to the right are the plots of y against x, residuals against fitted and normal scores. A different teacher might consider much of this output superfluous, in which case the offending cells can simply be deleted. Yet another teacher might consider regression diagnostics to be indispensable, in which case these can be added. Chacun a son gout! Spreadsheets are palatable The most convincing argument for using a spreadsheet package in an introductory statistics course is the fact that it carries no stigma among students, particularly students of business and economics. The spreadsheet, second only to the word-processor, is now recognised as a fundamental part of computer literacy. The statistical package, however user-friendly is alien to them; something new, with strange and unfamiliar commands. The teacher is on the defensive from the start, seeking to persuade them that it is really quite easy when you get used to it. The problem on a service course is that the students rarely do "get used to it". Whereas their spreadsheet skills are exercised in other areas of the curriculum, the statistical package may not get an airing between one class and the next. The very unfamiliarity of the statistical package diverts attention away from the important issue of learning statistical concepts. Students spend longer generating computer output than they do thinking about its interpretation. On the other hand, learning statistics with a spreadsheet helps to reinforce the idea that this is something connected to the real world. This palatability of spreadsheets can also be the salvation of statistics as a discipline on certain service courses. In the authors experience hard-pressed programme managers have no qualms about ditching the statistical content of their programme. However, the offer of a course which includes analytical spreadsheet skills, data gathering skills, word-processing and report writing skills and oral presentation skills, all in the context of statistics, proves irresistible! Spreadsheets are limited The author is not suggesting that a spreadsheet is the answer to every need. Although spreadsheets may be ideal for a first course in statistics for non-specialists, when it comes to serious data analysis a note of caution must be sounded. A spreadsheet is adequate for descriptive statistics, charts, cross-tabulations, simple linear regression, time series decomposition and elementary significance tests, but it is not particularly suited to more advanced analyses. Clearly there is a strong argument for students majoring in statistics to be exposed to an industry standard statistical package. Such students ought to have both the competence and confidence to use a dedicated statistical package successfully. There must also be doubts regarding the ability of a spreadsheet to carry out more sophisticated analyses with sufficient numerical accuracy. It is possible to write macros to perform almost any analysis you wish but, with single-precision arithmetic being used for each calculation, numerical stability becomes a very real concern. Also, programming errors are hard to detect. As a rough guide, once the spreadsheet loses its transparency it is probably not the right tool for the job. Conclusion The author is convinced that the spreadsheet environment is ideally suited to developing statistical concepts, not only at undergraduate level but also at school level. In many schools the budget will not stretch to purchasing a dedicated statistical package, but virtually every school with a computer has access to a spreadsheet package of some kind. With this in mind, the DISCUS project at Coventry University - Discovering Important Statistical Concepts Using Spreadsheets - is dedicated to producing low-budget spreadsheet-based teaching materials for elementary statistics courses. Units have been produced on Descriptive statistics, Probability, Binomial distributions, Poisson distributions, Continuous distributions, Sampling, Regression and Hypothesis testing. Each unit consists of a Microsoft Excel workbook of spreadsheets with accompanying work-cards of student-centred investigations. The complete set of DISCUS materials are available from Coventry University Enterprises Limited priced £23.50 (inclusive of VAT and site licence). References Hunt D N (1996). Spreadsheet Histograms. Teaching Statistics 18,1,10-11. Kowszun, J (1988). Microcomputer Approach to Teaching Linear Regression Using Spreadsheets. Teaching Statistics 10,3,66-71. Lageard J J (1988). Using a Spreadsheet to Teach Coding of Bivariate Data. Teaching Statistics 10,1,20-22. Tamura, H (1994). Model Comparison in Regression. Teaching Statistics 16,2,47-49. |
| All Contents © Neville Hunt 1999 |