In this report we are going to analyze the relationship between exam performance and seven independent variables. These independent variables include the students last high school grade in math, the student’s statistics II exam, the general interest in statistics, how well the teacher explained the content, how many hours the student spend studying, how social the student is and the student’s gender. The goal of this analysis is to evaluate which variables are the best predictors of the exam performance of the statistics III exam.
Therefore, we will describe the data and run a multiple regression analysis. The analysis is based on a sample of 200 students and the grading system ranges from 1 to 10 whereas 1 is the lowest and 10 is the highest grade you can get.
The SPSS output of the descriptive data shows that the average grade of the 200 students was a 5,84, with a standard deviation of 1,091. Similarly, the average grades for the last math exam was 5,88 and the average grade of the statistics II exam was 5,87.
The standard deviations were relatively close with respectively 1,129 and 0,991. The correlation between the last math grade and exam performance was r= 0,118 and the correlation between the statistics II grade and exam performance was r= 0,217. The variable for interest in statistics had an average of 5,02 with a standard deviation of 1,051 and a correlation of r= 0,033. The clarity in teaching showed a mean of 6,45 and a standard deviation of 1,078. The correlation between exam performance and clarity in teaching is r= 0,240. The average hours the students revised their study material before the exam were 19,89 with a standard deviation of 1,081 and a correlation of r= 0,462.
The variable which showed how social a student is ranged from 1 to 7 and the average was 4,15 with a standard deviation of 1,984 and a correlation of r= -0,01. Mean of gender was 0,5 with a standard deviation of 0,501 and a correlation of r= 0,11 (Table A1, Table A2).
P1: There are 2 independent variables which are a statistically significant contribution to the model at an alpha level of 0.05. These variables were the clarity of the teaching style and the hours a student spent on revising the study material. Further, the variable which showed the statistics II exam grade was very close to be statistically significant with a p-value of 0.053 (Table B3).
P2: The value of R? is 0.355 which is also the value of the explained variance in the model (Table B4).
M1: ?= -7,563+0,504X1+0,295X2+0,252X3 (Table C6)
M2: The first variable which was added to the model by the forward method was the variable ‘revise’ (Table B5). The variable ‘revise’ was chosen because if we consider the correlation table the correlation between ‘revise’ and ‘exam performance’ was the highest with 0.462 (Table A2). The model summary table shows that the predictor ‘revise’ explains the most variance in the model to all the others (Table B5). By looking at the test statistic we figured out that the variable ‘revise’ has the highest t-value compared to the other variables which means that ‘revise’ has the highest statistical significance (Table C6).
M3: The estimated coefficients b1, b2, and b3 are the slopes of the model and show by how much the dependent variable will change, given that the other variables are hold constant. When variable revise increases by 1 unit, the exam performance will increase by 0,504, given that the variables clear teaching und statistics II are held constant. The corresponding t-statistic is 8,62 with 196 df’s. When variable clear teaching increases by 1 unit, the exam performance will increase by 0,295, given that the variables revise, and statistics II are held constant. The corresponding t-statistic is 5,031 with 196 df’s. When the variable stats II increases by 1 unit, the exam performance will increase by 0,252, given that all the variables revise, and clear teaching are hold constant. The corresponding t-statistic is 3,967 with 196 df’s (Table C6).
M4: The squared semi-partial correlations are (0.496)2 for revise, (0.290)2 for clear teaching, and (0.228)2 for statistics II (Table C6).
The squared partial correlations are (0,524)2 for revise, (0,338)2 for clear teaching, and (0,2379)2 for statistics II (Table C6).
The squared semi-partial correlation for revise is (0,496)2. This value reflects the explained variance by the unique part of the predictor ‘revise’ in the dependent variable ‘exam performance’ as a whole (Agresti, 2018).
The squared partial correlation for revise is (0,290)2. This value reflects the explained variance by the unique part of the predictor ‘revise’ in the dependent variable ‘exam performance’ which is not overlapping with other independent variables and the part of the dependent variable that is not explained by other predictors (Agresti, 2018).
M5: In the final model 0,35% variance (R2) is explained. When adjusted, 0,34% variance is explained (Table C7). The R2 of the final model is slightly lower than the R2 of the previous model in P2. In the previous model we took seven predictors into account, while in the final model we only had three predictors. In practice, any predictor, even if not relevant, adds explained variance to a model.
M6: Yes, the final model is overall significant at an alpha level of 0,05. The value of the test statistic is 35,204 with corresponding df’s F(3, 196) (Table C8).
M7: The normality assumption tells us if the residuals follow a normal distribution which can be checked with PP-plots (Agresti, 2018). As the plot shows, our residuals almost follow a straight line. That means that the normality assumption is met (Figure D9).
The homoscedasticity assumption evaluates if the residuals follow a distribution with a constant variance (Agresti, 2018). As the residual plot shows, there is no increase or decrease in the variance of the residuals but a random cloud. This means that the variances are hold constant and therefore the homoscedasticity assumption is met.
The linearity assumption holds if there is no pattern in the residual plot (Agresti, 2018). As there is no pattern in our residual plot but only a random cloud, the linearity assumption holds (Figure D10).
A1: The mean of the unstandardized residuals is 0.
C1: How can you know what your next exam grade will be? In our latest research we investigated 200 students and evaluated some aspects that played a role in their exam performance. We looked at several possible predictors from which we found three to play a big role. The best prediction how good your next grade will be is the number of hours one spends on revising the study material. It is most important how many hours a student spends on revising the material in order to get better results on the final exam. Furthermore, the better the teacher explains the material the more likely you are of getting a good exam grade. Finally, if you already took a statistics exam which you passed you have greater chances to pass another one too. Not relevant is the interest in the material, the gender, the grades in high-school and how social one is.
C2: One disadvantage of the forward method deals with multicollinearity. Too much collinearity between two predictors might cause SPSS to put predictors in the model even if they are not that relevant (Brace, Kemp & Snelgar, 2016).
Agresti, A. (2018). Statistical methods for the social sciences (Fifth ed.). Boston: Pearson.
Brace, N., Kemp, R., & Snelgar, R. (2016). SPSS for psychologists: (and everybody else). New York, NY: Routledge.