Lecture slides     |     Basic equations     |      Text     |     Tools     |     Help
Lab 2: Linear Regression

Error analysis and applications

       
 

Instructions and materials

Download the instructions for computer lab 2

Download the quiz for computer lab 2

Linear regression in Excel

Excel is our basic tool for conducting linear regression in the course. We will also introduce Igor in order to introduce you to a more elegant software package. But, Excel gets the job done. There are two basic tools in Excel. Trendline is a way to obtain a linear regression of data in a plot. This is a good quick way to get an estimate of the slope and intercept. Trendline also gives you the correlation coefficient, R2. However, if you want the errors themselves (i.e. in the slope and intercept) then you need to use LINEST. The segment below shows both of these methods. It also reviews the matrix steps that are behind LINEST (and Trendline). These are given in Lecture 2 as well.

View the application of Trendline and LINEST

pdf

The LS command worksheet

To use the LS command worksheet you will simply need to download the commands

Download commands for the LS worksheet

First paste your data as x,y pairs in columns A and B of an Excel worksheet. Then follow the directions exactly by making a rectangle for the LINEST output and AVERAGE, SUMSQ (sum of squares) and TINV (t-test). Then you will input 5 lines of code that will generate the regression line (column C), the lower 95% confidence limit (column D), the upper 95% confidence limit (column E), the lower 95% prediction limit (column F) and the upper 95% prediction limit (column G). These steps are all shown in the PDF in the link below.

Screen shots of the steps used in the LS Excel spreadsheet

The Q-test worksheet: treating a potential outlier in 1-D data set

The Dixon Q-test is based on a comparison between a calculated value Q = gap/range with the Q value for a given number of data points within the 95% confidence limit. The test assumes that the data conform to a normal distribution. Thus, the idea is that one is comparing a point to the distribution predicted by the remaining points. Given this assumption the test is relatively direct. Calculate the range = highest - lowest and the gap = highest - next highest and take the absolute value of the ratio |gap/range|. This is done mostly automatically in the worksheet below. In the worksheet you still need to paste in the data in column A (starting on A2 and going down). You need to change the number of points in cell F2 so that the MAX and MIN functions go from A2 to A# where # is the number of data points. MAX($A$2:$A$#)-MIN($A$2:$A$#). As long as the number of data points is greater than 10 you do not need to change cell I2, which is set to 0.466, the value appropriate for all data sets greater than 10 points. One case is not considered here. If the outlier is a large negative value then you would need to order the data set from lowest to highest (instead of highest to lowest). The calculation in the DixonQ worksheet is set to use absolute values so that the case of a negative outlier work the same way as a positive one.

Screen shots of the steps used in the Dixon-Q Excel spreadsheet

pdf

Dixon-Q Excel spreadsheet

The Q-test worksheet: treating a potential outlier in 2-D data set

The Dixon Q-test for a line is based on the values in the residuals (residuals = calculated line - data). The column of y values can then be used in a Dixon Q-test in the same way as above. First, calculate Q = gap/range and then compare to the appropriate value for a 95% confidence limit. To test for a negative outlier the worksheet is expanded as shown in the downloads below.

Screen shots of the steps used in the Dixon-Q Excel spreadsheet for residuals

pdf

Dixon-Q Excel spreadsheet for residuals

Using the LS worksheet to calculate residuals

The limit of detection (LOD)

The LOD is determined by drawing a horizontal line from the intersection of the upper 95% confidence limit with the y-axis until it intersects the lower 95% confidence limit. But, often the data do not extend to zero. The worksheet below gives you two ways to use the LS data to extrapolate to zero and obtain an estimate for the LOD.

Screen shots of the steps used to determine the LOD

pdf

The RLS worksheet

The Excel worksheet has been designed as tool for your use throughout the course. It provides several options for error analysis. Perhaps the most important is a linear regression that includes outputs of the standard errors using both least squares and robust statistics. By the end of the lab and Lecture 1 you should understand that robust statistics is useful for determining outliers. Outliers can be excluded from the data set provided you have gone through the appropriate statistical analysis. The RLS worksheet does this, but you should not use it as a "black box". You should understand the various tools provided and make sure you can justify the applications of this worksheet as you perform analyises of the data in various laboratories and experiments in this course.

Download the Excel RLS worksheet