Knowledge Surveys in General Chemistry - ACS Publications

Knowledge Surveys in General Chemistry - ACS Publicationspubs.acs.org/doi/pdf/10.1021/ed100328cSimilarby P Bell - â201...

1 downloads 147 Views 1MB Size

Download PDF

ARTICLE pubs.acs.org/jchemeduc

Knowledge Surveys in General Chemistry: Confidence, Overconfidence, and Performance Priscilla Bell*,† and David Volckmann‡ Departments of †Chemistry and ‡Psychology, Whittier College, Whittier, California 90601, United States

bS Supporting Information ABSTRACT: Knowledge surveys have been used in a number of ﬁelds to assess changes in students’ understanding of their own learning and to assist students in review. This study compares metacognitive conﬁdence ratings of students faced with problems on the surveys with their actual knowledge as shown on the ﬁnal exams in two courses of general chemistry (Chem 110A and Chem 110B). The surveys were administered at the start and end of the course and correlated with the ﬁnal exam scores. The surveys and ﬁnal exams were found to be reliable, and the relatively high correlations between them suggested that students’ conﬁdence ratings on knowledge surveys were valid reﬂections of their actual knowledge. Students scoring high on the exams estimated their knowledge with greater accuracy than the lower-scoring students, who overestimated their knowledge (see ﬁgure). This phenomenon reﬂected the Dunning Kruger eﬀect, and the methodology of knowledge surveys isolated students’ eﬃcacy expectations, not outcome expectations, as the likely origin of the eﬀect. Finding remedial interventions to improve metacognitive skills for lowerscoring, overconﬁdent students poses a continuing problem. KEYWORDS: First-Year Undergraduate/General, Upper-Division Undergraduate, Chemical Education Research, Curriculum FEATURE: Chemical Education Research

chemistry students,6 which is one of the ideas that stimulated the current study of knowledge surveys in the general chemistry course. One of the more comprehensive deﬁnitions of metacognition comes from Gourgey (quoted in refs 7 and 8), and relates directly to the purposes of the knowledge survey outlined above:

E

ducators have many tools available for formative and summative evaluation of student learning. In addition to standard testing formats, knowledge surveys (KS) have emerged as tools for students to analyze their understanding of speciﬁc course content and for faculty to organize their course curricula.1 On these surveys, students face questions of varying diﬃculty and cognitive complexity according to Bloom’s taxonomy,2 and they are prompted to assign one of three levels of conﬁdence to each question: a. I have conﬁdence in answering this question. b. I could answer 50% of the question or know where to get information quickly. c. I have no conﬁdence in answering the question. By honestly assigning one of the three levels of response, students should be able to determine quickly the areas in which they excel and the areas that will need to be stressed in their review of the material. Another beneﬁt to students comes from the transparency of the faculty expectations of students’ knowledge and skill sets required for the entire course.3 An overview on knowledge survey construction can be found at the Merlot Elixr Web site.4 Students’ self-assessment of their understanding prompted by knowledge surveys involves metacognition, people’s knowledge about their own knowledge.5 Rickey and Stacy noted the importance of metacognition in the problem-solving abilities of Copyright r 2011 American Chemical Society and Division of Chemical Education, Inc.

[A]wareness of how one learns; awareness of when one does or does not understand; knowledge of how to use available information to achieve a goal; ability to judge the cognitive demands of a particular task; knowledge of what strategies to use for what purposes; and assessment of one’s progress both during and after performance. Knowledge surveys created for this study focus on all but the ﬁrst component of this deﬁnition, whereas most of the currently published chemistry studies on metacognition focus a greater degree of emphasis on this ﬁrst component.7,9 12 For example, Sandi-Urena, Cooper, and Stevens developed an inventory to assess the planning, evaluating, and monitoring of problemsolving skills of students, calling these areas “metacognitive skillfulness”, which are generally useful to any person solving problems in any context.10 By contrast, knowledge surveys

Published: September 02, 2011 1469

dx.doi.org/10.1021/ed100328c | J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education monitor student conﬁdence in their own speciﬁc problem solving skills and knowledge directly related to course content. Knowledge surveys have emerged as tools for faculty curriculum development and for enhancement of student understanding of speciﬁc course content.1 The usefulness of knowledge surveys to faculty and students rests on their validity. That is, do student responses to knowledge survey questions accurately measure students’ actual knowledge of course content? The major way to answer this question is to correlate knowledge survey results with measures of student performance in the course. Few studies have reported these data. In a geology course (N = 15), Wirth and Perkins3 measured correlation coeﬃcients in the range of r = 0.54 0.79 between survey scores and test or course grades. In contrast, Bowers,13 using a survey for ﬁve sections of an introductory biology course (total N = 336), found much lower correlations between survey scores and course grades (r = 0.21 0.46). The current study in chemistry classes explores the validity of knowledge surveys in some detail. No published knowledge surveys are available in the ﬁeld of chemistry, although they have been reported for a number of other ﬁelds, including statistics,14 geology,3,15,16 and biology.13 This study began with the development of two knowledge surveys for the two sequential semesters of general chemistry taught at a small liberal arts institution during the fall semesters of 2005 2007 (Chem 110A), and the spring semesters of 2007 2008 (Chem 110B). Initial interest in the knowledge surveys was stimulated by their potential eﬀectiveness as instructional aids. Subsequent analysis allowed the authors to investigate the relationships between students’ responses on the knowledge surveys and their performance on course exams. Of particular interest was the extent to which students’ scores on the knowledge survey would reﬂect their performance on exams. Kruger, Dunning, and their colleagues have published several papers showing how metacognitive ratings of performance are likely to vary greatly among individuals with varying skill levels.17,18 These authors, as well as others, have noted that people tend to overestimate their performance, and this error in estimation is greatest among the weakest performers.7,8,17 19 So the data collected for this study were analyzed to explore this possibility as well. The examination of data from three classes of students in Chem 110A enabled a compelling analysis for addressing the issues above. An independent study of a smaller set of data from Chem 110B served to replicate the ﬁndings from Chem 110A.

’ METHODOLOGY Chem 110A

The ﬁrst author, who taught all three classes of Chem 110A reported in this study, created a 126-question knowledge survey covering the same topics as the eventual test questions, though none were identical to them. The survey questions were selected from midterm examinations written by the ﬁrst author with Bloom’s levels2 in mind. The resulting distribution of cognitive complexity of questions on the survey reﬂected that of the actual ﬁnal examinations (see the Supporting Information for samples of matched questions from the knowledge survey and ﬁnal exams). For the Chem 110A survey, the percentage of questions were distributed roughly as follows according to Bloom’s taxonomy:2 knowledge (12%), comprehension (21%), application (27%), analysis (19%), synthesis (6%), and evaluation (5%). Both verbal and written instructions prompted students to circle

ARTICLE

one of three responses directly on the survey, labeled a, b, or c, indicating their level of conﬁdence in answering the question. Response “a” should be chosen if students were conﬁdent that they could answer the question suﬃciently well for graded test purposes. The “b” response would be selected if they could answer at least 50% of the question or knew precisely where they could quickly get the information (within 20 min), and then could complete the answer for graded test purposes. The “c” answer would be chosen if they were not conﬁdent they could adequately answer the question for graded purposes. The knowledge surveys were administered in the ﬁrst class meeting of the course before any instruction had taken place (preKS). Students were not timed, and most completed the exam in approximately 30 min. The instructor distributed copies of the survey to students following its ﬁrst administration. Subsequently, the instructor highlighted the questions from the knowledge survey associated with each test in the course and posted answers online the day before the exam to allow students to use the knowledge survey for review. This was done for all tests, and the complete set of answers was available before the ﬁnal exam. In the last class period of each course, 4 days prior to the ﬁnal exam, the instructor administered the same knowledge survey (postKS), announcing that the results would be tabulated after class grades were assigned. Students were required to identify themselves on each administration of the survey, but were informed that their identity would not be conveyed to the instructor until after grades were assigned. This mode of administration was implemented to decrease the likelihood of the selfenhancement motive described by Mabe and West.20 While the form of the survey in this study was paper and pencil, it is well-suited for online testing and analysis as seen in the geology knowledge surveys developed by Wirth and Perkins.3 In the three classes of Chem 110A, 166 students completed both the pre- and postinstruction knowledge surveys (N2005 = 46, N2006 = 62, N2007 = 58). Students were assigned an identiﬁcation number, and each of their responses was entered into a spreadsheet using these values: a = 100; b = 50; and c = 0. This created an average KS scale of 0 100, whose range is equivalent to the range of the ﬁnal exam scores, to enable a direct comparison between the scales. Data from two Chem 110A students were eliminated from the analysis because their responses did not change throughout the survey and their level of choice was not consistent with their ability (e.g., a conﬁdent student with A grades giving nearly all KS responses as “c”), which left a sample of 164 students. Data that had been collected on the class intake forms were also analyzed, including class standing and previous classes taken. The latter data were grouped in the following categories: no high school chemistry; one year of high school chemistry; two years of high school chemistry; advanced placement/honors chemistry; and college-level chemistry. In addition, test scores, ﬁnal exam scores, and class total points were added to the spreadsheet. Statistical tests were performed using PASW Statistics 18.0. Protocols for IRB were approved by the human subjects committee of the institution. Chem 110B

A parallel study was conducted with 47 students taking the second semester of general chemistry. One of the students eliminated from the ﬁrst study was likewise eliminated from this study, resulting in a sample of 46 students. As before, the ﬁrst author assembled a 77-question survey for Chem 110B based on 1470

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education topics in the second semester of general chemistry. This survey was ﬁeld tested in her colleague’s Chem 110B class prior to the year she taught the course. The questions, according to Bloom’s levels,2 were distributed as follows: knowledge (16%); comprehension (21%); application (40%); analysis (14%); synthesis (6%); and evaluation (3%). The surveys and statistical analysis were administered in an analogous way.

’ RESULTS AND DISCUSSION Chem 110A Reliability of Instruments

Reliability measures of the preKS and postKS for each of the two knowledge surveys were obtained using Cronbach’s α with an Excel program supplied by Nuhfer;21 the results reﬂect the internal consistency of the test items. The reliability for each administration of the survey was quite high. Cronbach’s α values for preKS and postKS for Chem 110A combined data were α = 0.976 (α2005 = 0.990; α2006 = 0.982; α2007 = 0.977), and α = 0.964 (α2005 = 0.927; α2006 = 0.975; and α2007 = 0.941), respectively. The ﬁnal exams were also examined for reliability, resulting in Cronbach’s α values of 0.886, 0.871, and 0.807 for the three sections of Chem 110A. Chem 110A Data Analysis

The validity of the Chem 110A knowledge survey to measure students’ knowledge in the course was assessed by comparing the postknowledge survey scores (postKS) to the ﬁnal exam for the course. First, a more targeted subset of questions on the survey was selected to match the actual questions on the ﬁnal exam. (This focused subset is designated postKS*.) Correlations for the three years between postKS* and ﬁnal exam were r = 0.537, 0.666, and 0.496 (p < 0.001), respectively. So the overall correlation between the postKS* and the ﬁnal exam was fairly strong (r = +0.556, p < 0.001), indicating that the postKS* was generally a valid measure of student knowledge at the end of the course. That is, the distributions of students’ conﬁdence ratings assessing their knowledge of the course content generally matched the distributions of their scores on the ﬁnal exam taken four days later. Furthermore, the correlation between the full knowledge survey, postKS, and the ﬁnal exam was nearly identical (r = +0.555, p < 0.001), no doubt owing to the fact that the entire knowledge survey is highly internally consistent. Correlations of posttests with the ﬁnal exam of around r = 0.56 compares favorably with that found by Wirth,3 who reported r values in the range of 0.56 0.68. Such values should be considered relatively high. On the basis of reviewing more than 1300 articles and books on student ratings of teaching, Cashin proposed that in the social sciences:22 [C]orrelations between 0.20 to 0.49 are practically useful. Correlations between 0.50 and 0.70 are very useful but they are rare when studying complex phenomenon. It is possible that the high correlations between knowledge survey and ﬁnal exam scores were a result of ordinary student characteristics, such as previous chemistry knowledge or testing experience. For instance, students may respond consistently in both testing formats. If these characteristics are especially inﬂuential, both pre- and postknowledge surveys should generally correlate with the ﬁnal exam equally, thereby discounting the contribution of student learning through the course. The overall correlation between the preKS and ﬁnal exam, though signiﬁcant, is only r = 0.190 (p < 0.05). The individual classes had r values of

ARTICLE

0.121, 0.250, and 0.143 (p > 0.05), respectively, in contrast to the postKS overall correlation r = 0.556 and individual class correlations of r = 0.537, 0.666, and 0.496 (p < 0.001). The dramatic increase of the correlation coeﬃcients highlights the eﬀect of the course on student responses on the survey. Additionally, the inﬂuence of students’ previous testing knowledge or testing experience, as well as response biases on the knowledge survey, may be minimized by controlling for preKS values in a partial correlation of postKS with ﬁnal exam. In our analysis, the change in the postKS correlation with ﬁnal exam was negligible (r = 0.556 versus the partial correlation of r = 0.536 controlling for preKS). The preKS did not correlate strongly with any course midterms (r < 0.232, p > 0.05). Conversely, correlations between postKS and midterms generally increased as the course progressed and were especially high for Test 4 (r = 0.600) and Test 5 (r = 0.668). These tests were given 3 weeks and 1 week before the end of the course, respectively, and covered 40% of the topics on the knowledge survey. These higher correlations may reﬂect the enhancement of the students’ metacognition on these topics as a result of the feedback received on the exams. The level of student preparation was considered to be a possible contributor to the variation in knowledge survey results. As expected, the means of preKS were higher for those with more extensive chemistry experiences (preKS means were 15, 23, 34, 35, and 39, respectively, for students with no chemistry, 1 year, 2 years, advanced placement/honors chemistry, and college chemistry experience). A nonparametric test was chosen to compare the distribution of these groups because of the large variation in group size. A Kruskal Wallis test showed that diﬀerent levels of student preparation (excluding the group of only two students who had no prior chemistry) produced signiﬁcantly diﬀerent preKS distributions (p < 0.001). Interestingly, the postKS survey distributions for the diﬀerent levels of preparation were also signiﬁcantly diﬀerent (p < 0.05). So for the ﬁrst course in general chemistry, the previous familiarity with the material led to diﬀerences in conﬁdence ratings not only at the beginning, but also at the end of the course. On the basis of Mabe and West’s20 analysis of metacognition, class standing was expected to be related to KS, because the more mature students might have had other experiences with self-evaluation and might therefore be better at assessing their own knowledge. Their likely completion of additional courses in problem-solving skills should also have enhanced their scores.6,23 However, no signiﬁcant diﬀerences between students of diﬀerent classes were obtained. Student Self-Estimates and the Dunning Kruger Effect

One of the goals of this paper was to explore the correlation between students’ subjective conﬁdence ratings on the knowledge survey and the more objective measure of student knowledge on the ﬁnal exam. In the discussions that follow, it is assumed that there is a rough equivalency between these scales, which have the same 0 100 ranges. The choice of the values for the response alternatives on the knowledge survey (0, 50, or 100) was made based upon how particular degrees of knowledge might be evaluated on a summative test. Chem 110A Discussion

Figure 1, representing data from 164 students, shows the correspondence between each student’s Chem 110A ﬁnal exam score (ranked from low to high, creating a monotonically increasing line) with his or her associated postKS* scores. The 1471

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education

Figure 1. Chem 110A 2005 2007 postKS* scores and ﬁnal exam scores ranked by students’ ﬁnal exam score.

postKS* scores are joined with straight lines to make it easier to see the student-to-student variations in conﬁdence. Inspecting the ﬁgure, one can see that the between-student variability in postKS* scores is relatively uniform across the range of ﬁnal exam scores. The postKS* standard deviations for students in the bottom-, middle-, and top-third of ﬁnal exam scores were 14.3, 11.9, 10.7, respectively. These decreasing levels of variability could easily be accounted for by the depressive ceiling eﬀect on scores as they approach the maximum of 100. It seems, then, that students with the highest exam scores are just about as variable at gauging their course knowledge as students who have the lowest exam scores. More interesting in Figure 1 is the progressive change in students’ abilities to gauge their own knowledge over the entire range of ﬁnal exam scores. For example, among roughly one-third of the students who scored the lowest on the exam, only eight of them underestimated their performance on the exam. The rest of that one-third overestimated their knowledge as they approached the ﬁnal exam, and many by a wide margin. On the other hand, the top-third of the students produced much more accurate estimations of their ability, but it appears that the majority of them, 28, underestimated their ability relative to the exam. To test this progressive change of the postKS* as a measure of student learning over the range of ﬁnal exam scores, we established a variable that measured student subjective knowledge estimates relative to objective knowledge measurement (EstΔ*) by calculating the diﬀerence between the postKS* and the ﬁnal exam score for each student. The correlation of this variable with the ﬁnal exam score was very strong (r = 0.586), reinforcing the notion that there is a systematic improvement of self-assessment from weaker to stronger students. This negative correlation reﬂects the inverse relationship between EstΔ* and the ﬁnal exam score, indicating the large overestimation of weak students and the relatively more accurate (or even underestimation) of the stronger students. A simpliﬁed view of the relationships among postKS* scores and ﬁnal exam scores (with preKS for comparison) is presented in Figure 2, in which the students are arbitrarily divided into three groups, depending on their level of performance on the ﬁnal exam. The overestimation on the postKS* by weaker students and the more accurate estimation by the strongest students is now quite obvious. The relationships shown are nearly identical to those repeatedly displayed in the research of Dunning and

ARTICLE

Figure 2. Chem 110A preKS, postKS*, and ﬁnal exam scores sorted by student groupings.

Figure 3. Chem 110A 2005 2007 data showing the relationship between mean EstΔ* values and ﬁnal exam scores.

Kruger and their colleagues and others.8,17 19 The relationship, the so-called Dunning Kruger eﬀect, has been replicated many times under various experimental and real-world conditions. Generally, people with less competence will have a positive bias for rating themselves or their performance as above average. In fact, more than half of them will tend to rate their competence or their performance well “above average”, a notion that does not make obvious sense. The more incompetent the individual, the greater this positive bias becomes. That is, the more incompetent people are, the greater the diﬀerence between their self-assessment and actual ability. Extremely competent people, on the other hand, will have more accurate self-assessment, and may even show negative bias.17 The mean EstΔ* is essentially the average over- or underestimation bias of the students. Figure 3 focuses on the signiﬁcant change of EstΔ* as a function of the arbitrarily divided three groups of ﬁnal exam scores (F [2, 161] = 26.5, p < 0.001). Students in the lower third of the test scores overestimated their knowledge by the largest margin (by an average of 18%), indicating that they are the least eﬀective in judging their own ability accurately. The middle third also overestimated their knowledge, but to a much lesser degree (by an average of 7.5%). The students scoring in the top third were the most realistic, with a slightly pessimistic underestimation (by an average of 0.11%). These data indicate that, for the upper two-thirds of the students, their average postKS* score in Chem 110A estimated 1472

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education

Figure 4. Chem 110B 2008 postKS* and ﬁnal exam scores ranked by ﬁnal exam score.

their understanding of course content on the ﬁnal exam within one grade variation (10%).

ARTICLE

Figure 5. Chem 110B 2008 preKS, postKS*, and ﬁnal exam scores sorted by student groupings.

Chem 110B Reliability of Instruments

Cronbach’s α values for preKS and postKS for combined data of students in Chem 110B were again quite high at α = 0.972 (α2007 = 0.963, and α2008 = 0.978) and α = 0.957 (α2007 = 0.957 and α2008 = 0.962). Cronbach’s α for the ﬁnal exam in 2008 was α = 0.857.

Chem 110B Data Analysis

For Chem 110B, taught by the ﬁrst author, the correlation between postKS* and ﬁnal exam score was slightly higher than that of Chem 110A reported above (r = 0.571 versus r = 0.556), replicating the close association between student conﬁdence ratings on the knowledge surveys and summative assessments of student knowledge on the exam. The correlation between the overall postKS (which includes all 77 questions) and the ﬁnal exam score was even higher (r = 0.583), conﬁrming again the internal consistency of the knowledge survey. Unlike the Chem 110A survey, neither the preKS nor the postKS scores were signiﬁcantly related to students’ various levels of preparation for Chem 110B according to a Kruskal Wallis analysis of distributions. These results can be explained by the fact that the spring topics (kinetics, equilibrium, and electrochemistry) are less extensively covered in high school and preparation courses in college and would therefore have much less inﬂuence on students’ knowledge about the course topics. Again, class standing did not relate signiﬁcantly to survey variables. Figure 4 demonstrates again considerable student-to-student variation in their postKS* scores over the range of student abilities in Chem 110B. Additionally, as before, the students showing the least ability on the ﬁnal exam showed the greatest tendency to overestimate their competence on the postKS*, whereas students showing moderate or excellent command of the subject matter of the course had postKS* scores closer to their eventual ﬁnal exam scores. This time the negative correlation of EstΔ* with the ﬁnal exam score (r = 0.605) was even higher than that of Chem 110A result, presumably illustrating again the strength of the Dunning Kruger eﬀect in these data. Figures 5 and 6 show the simpliﬁed views of the eﬀect, comparing the postKS* and EstΔ* scores of students exhibiting low, medium, and high levels of ability on the ﬁnal exam (preKS are included for comparison). These data are remarkably similar

Figure 6. Chem 110B data showing the relationship between mean EstΔ* values and ﬁnal exam scores.

to the data not only from the three sections of Chem 110A, but also from nearly all of the replicated research studies conducted by Dunning, Kruger, and their colleagues.17,18,24 The lowestthird of the students overestimated their performance by appreciably more than a letter grade on the average. Mabe and West predicted that students would become better estimators of their knowledge with increasing familiarity with knowledge survey testing.20 Our results did not support this prediction. In fact, students’ ability to assess their knowledge grew worse from Chem 110A to Chem 110B. As the distribution of EstΔ* for Chem 110A had large skew and kurtosis, the comparison of EstΔ* between the two courses required nonparametric statistics. The median EstΔ* increased from 4.9 to 10.8 from Chem 110A to 110B for the 44 students who had taken both courses, showing a marginally signiﬁcant increase in overconﬁdence as they moved from the lower-level course to the higher-level course, according to a test of Wilcoxon signed ranks (p = 0.082). The mean postKS* scores decreased from 76.4 to 67.2, but the mean ﬁnal exam scores decreased even more, from 70.7 to 57.3, so the increase in EstΔ* is due to the lower ﬁnal exam score, ampliﬁed, no doubt, by the Dunning Kruger eﬀect. It is instructive to compare the above results for Chem 110B taught by the ﬁrst author (2008) with data obtained from the same course oﬀered in the previous year. It was noted above that the knowledge survey, written by the ﬁrst author for Chem 110B, had been originally ﬁeld-tested in a course taught by another 1473

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education

ARTICLE

discerning students were reluctant to express conﬁdence in their knowledge of the material on the postKS; and (iii) the students who eventually scored highest on the ﬁnal exam, having taken the postKS, discovered what their greatest weaknesses were and studied the areas that required the most review during the 4 days they had to study for the ﬁnal exam.

Figure 7. Chem 110B 2007 preKS, postKS, and ﬁnal exam scores sorted by student groupings.

faculty member in the chemistry department (N = 41). Although the original course covered identical topics in the same order with the same textbook, there were obvious diﬀerences in the administration of the course, especially with regard to the use of the knowledge surveys. First of all, the surveys were not integrated into the course as fully as was the case in the 2008 course discussed above. The preKS was delivered during the ﬁrst class period, but throughout the course students never had access to the survey questions, answers, or discussion of the answers. The survey was again delivered on the last day of the course, 4 days before the ﬁnal exam. Furthermore, owing to diﬀerences in teaching and examination styles of the two instructors, the forms of the ﬁnal exam in the two courses bore no resemblance to one another. The format and type of questions asked on the ﬁnal exam in the ﬁrst author’s course were similar to what would be found on the knowledge survey. The other instructor’s ﬁnal exam included many fewer and much more global questions. (No measure of reliability of this instrument was available.) Given these major diﬀerences in courses, one might expect results involving the knowledge surveys to be diﬀerent as well. In fact, the correlations between postKS and the ﬁnal exam were quite diﬀerent (r = 0.374 for the original ﬁeld-tested course vs r = 0.583 for the course reported above conducted by the ﬁrst author). Thus, the scores on the knowledge surveys were much less reﬂective of student performance on the ﬁnal exam in the original course. However, more similarities than diﬀerences were found in the data between the two courses. In the original course, the means for preKS and postKS were 18.0 and 63.3; in the subsequent course, they were 23.9 and 64.5. The mean ﬁnal exam scores were even closer for the two courses, 57.7 and 57.5. Finally, as Figure 7 shows, plotting the mean knowledge survey scores against the mean ﬁnal exam scores for the lowest-, middle-, and highest-scoring student groups produces the familiar pattern for the Dunning Kruger eﬀect, in which the less competent individuals exaggerate their ability relative to their ﬁnal performance evaluation. It is interesting that the students scoring highest on the ﬁnal exam in the original Chem 110B course show in their knowledge survey scores such a dramatic underestimation of their competence—over one letter grade lower than their ultimate ﬁnal exam average. At least three possible reasons might explain this extreme discrepancy: (i) the format of the more global ﬁnal exam had little resemblance to the format of the more detailed knowledge surveys; (ii) the students in this course had minimal experience with the knowledge surveys, and so the most

’ CAUSES OF THE DUNNING KRUGER EFFECT Why would the poorest performers display such glaring overconﬁdence? In their original analysis, Kruger and Dunning17 suggested simply that people with little skill or knowledge, ﬁrst, are not in a position to know how to answer questions, and, second, they do not have the metacognition to recognize and gauge how deﬁcient they are. These authors argue that the skills necessary to make an accurate metacognitive judgment about whether their answers or analyses are correct are the same skills as the ones that are required to come up with the correct analyses in the ﬁrst place. People incapable of arriving at correct answers would also be incapable of exercising any metacognitive discrimination that could help distinguish which answers are correct and which are incorrect. Two potential alternatives to the above argument are that people with little ability produce inaccurate self-evaluations either because they desire to save face (for themselves or for others) or because they simply do not care much about their own self-evaluations. However, research by Ehrlinger et al.18 demonstrates that neither of these two explanations has merit. In one experiment, subjects were motivated by substantial extrinsic rewards ($100) whenever their self-evaluations were completely accurate, and in another experiment students were threatened with a personally embarrassing situation if they submitted inaccurate self-evaluations (interview by an expert professor). In neither case did the sizable overconﬁdence in their knowledge disappear, and in fact, their naturally inﬂated self-evaluations tended to become even more magniﬁed in the face of strong opposing motivational pressures! It appears, then, that students who do not know much about a subject are much too conﬁdent about the material for their own good. The ordinary feedback that they should be obtaining by completing a knowledge survey is in great part wasted on them. Therefore, even though knowledge surveys might be useful in identifying students according to their competence in the class, it is hypothesized that knowledge surveys may not be helpful instructional resources for the majority of students in the lowestthird of the class. In previous research, very competent individuals have tended to underestimate their performance. (See ref 24 for a review.) In most of this research, subjects had been asked to estimate their performance in comparison to others completing the same task. It is likely that the cause of the underestimate comes from the erroneous impression of competent individuals thinking that most everyone else is as competent as they are at the task. So when asked, for example, in what percentile their performance would fall, they are reluctant to indicate the highest percentiles. However, whenever subjects were asked to estimate their raw performance, the underestimation diminished or disappeared altogether. In the present research, it should be remembered that subjects were asked on their knowledge surveys neither to compare their performance with others nor to guess at their raw scores. Rather, on knowledge surveys students are simply asked to gauge the extent to which they might be able to answer 1474

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education questions (regardless of how well other students might perform). The minimal underestimation found among the best students who had the knowledge survey for review in Chem 110A and Chem 110B is likely because they were not making judgments about their ability in comparison to other students. The slight underestimation found could easily be accounted for by individual error variance around relatively accurate self-evaluations. One further intriguing idea highlights the diﬀerences between the methodology of knowledge surveys and the typical methodology used to illustrate the Dunning Kruger eﬀect. In Bandura’s25,26 conception of personal control, two types of expectations relate to personal action: eﬃcacy expectations and outcome expectations. The central metacognitive question for eﬃcacy expectations is, “Can I do it?” For example, a student responding to a knowledge survey question might ask, “Can I solve this problem?” or “Do I have the skill to set up and perform this titration?” The central question for outcome expectations is, “Will what I do work?” For example, “Will I get credit for the right answer for this problem?” or “Will the indicator change when it should?” Typically in research that illustrates the Dunning Kruger eﬀect, subjects are ﬁrst expected to perform a task (such as take a test or engage in a debate). Only afterward are they asked how they believe their performance will measure up to some standard and how it will rank compared to the performance of other subjects on the same task. It appears in this customary methodology, then, that subjects have always been asked about both their eﬃcacy and their outcome expectations after the performance has been completed. Therefore, according to Bandura’s model, either type of expectation (or both) might be the basis for the over- or underevaluation of subjects in the typical research design; the two types of expectation are theoretically confounded in the design of the research. However, in the methodology for research on knowledge surveys, it appears that only one kind of expectation is being prompted: eﬃcacy expectation, or, “Can I do this?”—not outcome expectancy, or, “How will my answer be scored?” or “How will my answer be compared to others in the course?” The fact that the Dunning Kruger eﬀect can be elicited using the methodology of knowledge surveys supports the interesting idea that the eﬀect is largely derived from subjects’ metacognition about eﬃcacy expectations and not outcome expectations.

’ CONCLUSIONS The replicated data in this study show several important general characteristics of knowledge surveys used in a yearlong sequence of courses in general chemistry. First of all, the knowledge surveys created for both semesters were shown to be reliable measures of student responses according to an analysis using Cronbach’s α. Furthermore, the signiﬁcant correlations between students’ knowledge survey scores and their ﬁnal exam scores show that the surveys are valid indicators of student knowledge of the course content and their skills at handling problems of general chemistry by the end of the course. Variations in student scores on the knowledge surveys were relatively consistent across all levels of student performance on the ﬁnal exam. Furthermore, the average knowledge survey score for the upper two-thirds of the students in both semesters was within 10% (one grade level) of the ﬁnal exam score. On average, the best students estimated their knowledge on the ﬁnal exam almost perfectly in classes where knowledge surveys were always available for review, but there was a

ARTICLE

progressive tendency for weaker and weaker students to overestimate their knowledge. This result ﬁts the description of the Dunning Kruger eﬀect, which has been shown in other recent research studies to be a robust eﬀect across several domains of self-evaluation. The ﬁnding of overestimation by the weaker students is expected to be a general phenomenon that aﬀects the results of knowledge surveys in any discipline. Although better students, who tend to estimate more accurately, appear to receive instructional aid from the use of knowledge surveys for review, it is hypothesized that other students who persist in overestimating their own competence are incapable of beneﬁting from the ordinary use of knowledge surveys and require other models of intervention (e.g., see ref 10). The unique methodology of knowledge surveys makes it clear that the Dunning Kruger eﬀect found in this study stems directly from students’ judgments about their own mastery of the problems and assignments of the course and not their expectations about grades that they might be assigned on their work. That is, this study suggests that the Dunning Kruger eﬀect is based upon eﬃcacy expectations independent from outcome expectations. Users of knowledge surveys would want to develop a way to counsel individual students. One way might be to determine a score below which students should be concerned about their potential diﬃculty in the class. While this might alarm some fraction of the better students, it would enable the group who tend to overestimate their knowledge to be aware that they are in jeopardy. For example, if a postKS score of 74 were used for Chem 110A, approximately 77% of the upper- and lower-thirds of the class would be correctly advised about their performance on the ﬁnal exam. Administering the knowledge survey midway in the course could permit a score to be determined that would prompt students to make some changes in their approach to reviewing for the course. Other authors have proposed methods for early identiﬁcation of overestimating students to trigger intervention.7,8 Once these students have been identiﬁed, it is not at all clear what types of interventions will be successful with these speciﬁc students. Sandi-Urena, Cooper, and Stevens, for example, have recently proposed a multistep intervention that increases not only metacognitive sensitivity in students, but their problem solving skills as well.15 Additional research needs to be performed to determine whether this kind of intervention would help this group of students. Evaluating the eﬀect of online administration of the survey as well as determining the value of more frequent delivery (before midterms) would be worthy topics of additional study, as these changes might enhance the eﬀectiveness of the students’ metacognition. The role of instructional modes and testing styles, as well as the inﬂuence of the survey authorship, would also be areas for additional study.

’ ASSOCIATED CONTENT

bS

Supporting Information Table of matched questions from Chem 110A ﬁnal exams and knowledge surveys; Knowledge Surveys for Chem 110A and Chem 110B. This material is available via the Internet at http:// pubs.acs.org.

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected]. 1475

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Journal of Chemical Education

ARTICLE

’ ACKNOWLEDGMENT The authors wish to thank Lori Smith for processing the raw knowledge surveys, as well as the Faculty of Whittier College for a Research Grant. We thank Edward Nuhfer for his invaluable help in calculating and verifying Cronbach’s α and for his constructive assistance with the manuscript. ’ REFERENCES (1) Nuhfer, E. B.; Knipp, D. The Knowledge Survey: A Tool for All Reasons. To Improve the Academy 2003, 21, 50–78. (2) Taxonomy of Educational Objectives; The Classiﬁcation of Educational Goals: Handbook 1—Cognitive Domain; Bloom, B. S., Ed.; Longmans, Green: New York, 1956. (3) Wirth, K. R.; Perkins, D. Knowledge Surveys: An Indispensable Course Design and Assessment Tool. Presented at the Innovations in the Scholarship of Teaching and Learning at Liberal Arts Colleges, St. Olaf, MN, 2005. (4) MERLOT ELIXR Home Page. http://elixr.merlot.org/ (accessed Aug 2011). (5) Weingart, F. E. In Metacognition, Motivation, and Understanding; Weingart, F. E., Kluwe, R. F., Eds.; Lawrence Erlbaum Associates: Hillsdale, NJ, 1987; p 8. (6) Rickey, D.; Stacy, A. M. J. Chem. Educ. 2000, 77, 915–920. (7) Cooper, M. M.; Sandi-Urena, S.; Stevens, R. Chem. Educ. Res. Pract. 2008, 9, 18–24. (8) Potgieter, M.; Ackermann, M.; Fletcher, L. Chem. Educ. Res. Pract. 2010, 11, 17–24. (9) Cooper, M. M.; Sandi-Urena, S. J. Chem. Educ. 2009, 86, 240–245. (10) Sandi-Urena, S.; Cooper, M. M.; Stevens, R. Int. J. Sci. Educ. 2010, 1, 1–18. (11) Schraw, G.; Brooks, D. W.; Crippen, K. J. J. Chem. Educ. 2005, 82, 637–640. (12) Tsai, C. J. Chem. Educ. 2001, 78, 970–974. (13) Bowers, N.; Brandon, M.; Hill, C. Cell Biol. Educ. 2005, 4, 311– 322. (14) Jordan, J. J. Stat. Educ. 2007, 15 (2); http://www.amstat.org/ publications/jse/v15n2/jordan.html (accessed Aug 2011). (15) Knipp, D. Knowledge Surveys: What Do Students Bring To and Take From a Class? http://web.archive.org/web/20100529084647/ http://www.isu.edu/ctl/facultydev/KnowS_ﬁles/KnippUSAFA/KSKNIPPUSAFA.html (accessed Aug 2011). (16) Nuhfer, E. B. J. Geosci. Educ. 1996, 44 (4), 385–394. (17) Kruger, J.; Dunning, D. J. Personality Soc. Psych. 1999, 7, 1121– 1134. (18) Ehrlinger, J.; Johnson, K.; Banner, M.; Dunning, D.; Kruger, J. Org. Behav. Hum. Decis. Process. 2008, 105, 98–121. (19) Issacson, R. M.; Fujita, F. J. Scholarship Teach. Learn. 2006, 6, 39–55. (20) Mabe, P. A.; West, S. G. J. Appl. Psych. 1982, 67, 280–296. (21) Nuhfer, E. Private communication. June 4, 2010. (22) Cashin, W. E. Student Ratings of Teaching: A Summary of the Research; IDEA Technical Report No. 20; Center for Faculty Evaluation and Development; Kansas State University: Manhattan, KS, 1988; pp 1 6. (23) Antonietti, A.; Ignazi, S.; Perego, P. Br. J. Educ. Psychol. 2000, 70, 1–16. (24) Dunning, D. Self-Insight: Roadblocks and Detours on the Path to Knowing Thyself (Essays in Social Psychology); Psychology Press: New York, 2005. (25) Bandura, A. Self-Eﬃcacy: The Exercise of Control; W. H. Freeman: New York, 1997. (26) Bandura, A. Psychol. Rev. 1977, 84, 191–215.

1476

dx.doi.org/10.1021/ed100328c |J. Chem. Educ. 2011, 88, 1469–1476

Knowledge Surveys in General Chemistry - ACS Publications

Recommend Documents