|
Dr. Dave Dempsey Prof. of Meteorology Dept. of Geosciences College of Science ddempsey@sundog.sfsu.edu |
Dr. Kathleen O'Sullivan Professor Dept. of Secondary Education College of Education kaosul@sfsu.edu |
Dr. Lisa White Assoc. Prof. of Geology Dept. of Geosciences College of Science lwhite@sfsu.edu |
| Class |
GM310.00 (NOVA course) |
GM310.01 (NOVA course) |
GM103 | G302 | M302S | M302M |
| Number of Students Scored | 4 | 4 | 6 | 19 | 18 | 5 |
The statements in the first two categories are taken directly from the TOSRA. We wrote the statements in the third category based very closely on the "Nature and History of Science" content standards from the National Resource Council's National Science Education Standards (NSES). We wrote the statements in the fourth category to find out what students feel or understand about the nature of climate, who studies it, and how it is studied.
For each statement, students are asked to indicate whether they strongly agree, agree, don't know how they feel, disagree, or strongly disagree. Statements are paired to increase the reliability of the scores, with the statements in each pair saying basically the same thing but one phrased positively and the other negatively. For each of the four categories of statements, there are 10 statements in 5 pairs, for a total of 40 statements. The sequence of 40 statements is organized so that the first four statements are drawn from the four categories in the order listed above, and each set of four statements thereafter is similarly selected and ordered. There is no systematic ordering of statements based on whether they are positively or negatively phrased.
We administered the assessment on the first day of class (the "pre-test") and again in the last week or so of the semester (the "post-test"). We scored only assessments completed by students on both dates, and we scored both sets responses only after the end of the semester. (To match each pair of pre- and post-test responses, we used the last four digits of each student's nine-digit student number, which each student was supposed to write on the response form. We did not score single, unmatched assessments.) To decrease the likelihood of miscoring, two people independently scored every response and any major differences between the two sets of scores were resolved by rescoring.
The first two types of tests above were also performed on the pre-test scores alone, to test hypotheses that the students in GM310 were initially no different from students in the four GE classes, that students in the four GE classes did not differ significantly from each other, and that the students in the two GM310 classes did not differ significantly from each other (at least as measured by these assessments).
| Hypothesis Tested | Type of Test |
Score |
Result of Test (95% significance level) |
| (a) Mean scores are the same among all six classes. | F-test | pre-test only |
Accept: all four categories |
| post/pre test difference |
Accept: Categories 1, 2, 3; Reject: Category 4 |
||
| (b) Mean scores are the same among all classes, with GM310 classes lumped together. | F-test | pre-test only |
Accept: all four categories |
| post/pre test difference |
Accept: Categories 1, 2; Reject: Categories 3, 4 |
||
| (c) Mean scores are the same among the four GE classes. | F-test | pre-test only |
Accept: all four categories |
| post/pre test difference |
Accept: Categories 1, 2, 4; Reject: Category 3 |
||
| (d) Mean scores for GM310 lumped together and for all four GE classes lumped together, are the same. | t-test | pre-test only |
Accept: all four categories |
| post/pre test difference |
Accept: Categories 2 and 3; Reject: Categories 1 and 4 (GM310 higher) |
||
| (e) Mean scores for all classes lumped together are zero. | t-test | post/pre test difference |
Accept: Categories 1, 2 and 3; Reject: Category 4 (positive) |
| (f) Mean scores for the four GE classes lumped together are zero. | t-test | post/pre test difference |
Accept: all four categories |
| (g) Mean scores for the two GM310 classes lumped together are zero. | t-test | post/pre test difference |
Accept: Categories 2, 3 Reject: Categories 1, 4 |
| (h) Mean scores for each course separately are zero. | t-test | post/pre test difference |
Reject: GM310.00, Cat. 4 (positive); GM310.01, Cat. 4 (positive); G302, Cat. 4 (positive); M302S, Cat. 3 (negative); Accept: all others |
Both of these problems raise issues involving control of multiple factors. It should be noted that in the geosciences, such control is relatively rare, and students in all five classes were unlikely to encounter a problem requiring interpretation of experiments that involve controls or in which controls are possible.
Many students chose (a), "bait but not soil condition". They received no credit for that choice, but if their explanation cited evidence that slugs were attracted to the bait (by referring explicitly to individual experimental cases) they received partial credit (typically 1 point). Students who made the correct choice and generally justified it logically (by referring explicitly to appropriate experimental cases) but claimed that slugs responded more strongly to bait than to soil condition received 4 out of a possible 5 points. Students who chose the correct answer but did not explain their reasoning by referring explicitly to appropriate experimental cases, could receive only partial credit. Students who chose the correct answer but gave an explanation inconsistent with that choice received only 1 point (for the correct answer).
Ignoring statistical uncertainties such as physiological differences between monkeys among the four groups, differences in the amount of food each group ate, differences in the levels of activity of monkeys among the groups, etc., then Group 4 acts like a control for Group 3 because the two are identical except that Group 3 received supplement B and Group 4 did not. Both groups gained weight, but supplement B appears to have reduced the weight gain relative to what would have happened without supplement B, the basis for choosing (b). Group 2 has no bearing on the question, since there was no group fed with no supplements at all, the appropriate control group for Group 2. Group 1 provides no useful information about the question.
Because of the uncertainties about comparability of monkeys among groups, activity levels, total food consumption, etc., it is possible to argue that with the information given, it is in principle not possible to tell with sufficient certainty whether supplement B influences weight gain. Hence, we gave full credit to students who chose (d) and justified the choice by citing some of these other possible effects on weight gain. However, most students who chose (d) did not cite these additional uncertainties, so we did not give them any credit at all (not even 1 point for choosing (d)). For example, it was relatively common for students to claim that we can't tell about the effects of supplement B because we don't know the initial mean weights of monkeys in each group. It was also common for students simply to restate (d) without offering any reasoning. Students who selected (d) because of the absence of a control group for Group 2 (that is, a group fed with no supplements at all) received 3 points out of 5 possible.
Most students chose (a), "Supplement B increases weight gain under some conditions", probably confusing "causes weight gain" with "increases weight gain", noting the results of Group 2 and either comparing it to results of Group 3 or failing to realize that a control group that ate food without any supplements at all would be needed to tell whether supplement B causes weight gain. Of course, because the question asks not about causing weight gain but rather causing differences in weight gain, this line of reasoning is based on a misreading of the question in the first place and these students received no credit at all. Some students chose (a) based on a comparison of Groups 3 and 4; we gave these answers 0 points.
Students who correctly chose (b) and noted correctly that Group 4 acts as a control group for Group 3, but referred to Group 2 as part of their reasoning, were given 4 out of 5 possible points (because Group 2 contributes no information to the question).
These questions were similar to those that we used as the basis for organizing the content and progression of our NOVA course, "Planetary Climate Change".
We then went over the instructions provided on the assessment instrument and asked the students to construct a hierarchical concept map in which they organized their own knowledge about climate, prompted by five key questions about climate. After 15-30 minutes, we collected the students' concept maps. (Especially at the beginning of the semester, 15 minutes was plenty of time because most students knew almost nothing about the subject. At the end of the semester, if the students had learned much about the subject at all, 15-30 minutes seemed to be enough time for the students to show a noticeable improvement if any such improvement was forthcoming--the exception rather than the rule in the control groups!)
A crude outline of the lines of connections associated with the key prompting questions (italics) and topics (bold-face) that we deemed important (and hence credit-worthy) looked something like this:
To increase scoring consistency, two of us--a meteorologist (Dempsey) with no previous experience scoring concept maps and some limited experience using concept maps as an instructional tool, and a secondary-science-education faculty member (O'Sullivan) with extensive experience scoring concept maps and using them for instruction but with limited knowledge about climate--scored student concept maps independently. In occasional cases where our scores differed markedly we consulted and rescored, but in general we simply averaged our two scores and performed our statistical analysis on the averaged scores.
| Hypothesis Tested | Type of Test |
Score |
Result of Test (95% significance level) |
| (a) Mean scores are the same among all six classes | F-test | pre-test only |
Reject |
| post/pre test difference |
Reject | ||
| (b) Mean scores are the same among all classes, with GM310 classes lumped together | F-test | pre-test only |
Accept |
| post/pre test difference |
Reject | ||
| (c) Mean scores are the same among the four GE classes | F-test | pre-test only |
Accept |
| post/pre test difference |
Accept | ||
| (d) Mean pre-test scores for the two GM310 classes are the same | t-test | pre-test only |
Reject |
| post/pre test difference |
Accept | ||
| (e) Mean pre-test scores for GM310 lumped together and for all four GE classes lumped together, are the same. | t-test | pre-test only |
Accept |
| post/pre test difference |
Reject (GM310 scores higher) |
||
| (f) Mean score for all classes lumped together is zero. | t-test | post/pre test difference |
Reject (difference positive) |
| (g) Mean score for the four GE classes lumped together is zero. | t-test | post/pre test difference |
Accept |
| (h) Mean score for the two GM310 classes lumped together is zero. | t-test | post/pre test difference |
Reject (difference positive); |
| (i) Mean score for each course separately is zero. | t-test | post/pre test difference |
GM310.00: Reject (difference positive); GM310.01: Reject (difference positive); All others: Accept |
Assessments