Last year I posted about a quick study I did involving the Classroom Test of Scientific Reasoning. In the post, I showed that students made gains in their scores on the test when they were shown physical demonstration of the apparatus. For example, I put two balls of putty on balances and showed them equal, then squished one. The results suggested that many of the students were getting caught up on the language of the test.
This year, I repeated the experiment with my new class of grade 11 physics students. However, there was only one changed answer in this second attempt. I suggest this means that the current group of students understood the test questions much better. This parallels my experience, as most of the students in the class have been at our school for a few years, whereas last year many had just arrived after moving from abroad.
Test results from 2014 and 2015. A lower line is better. In 2015 70% of my students scored 6 or less (similar to the US national average for high school students), compared with 50% in 2014, and only 30% in 2014 after the demonstration.
I haven’t yet gone through the answers to look at the students’ reasoning skills, but the results look similar to the pre-demonstration scores from last year, which I think means the students have a lower overall comfort level with scientific reasoning. My challenge is to find ways to embed these skills in our studies this year.
Back in August, I wrote about my attempt to understand how English communication was getting in the way of measuring scientific reasoning skills. I assigned my students 20 of the questions from Lawson’s Classroom Test of Scientific Reasoning (CTSR), dropping the four linguistically toughest. Once they students had finished the test in the regular way, I demonstrated the scenarios one at a time to provide contextualization.
You can read more about the assessment and the results here.
There are two things that I’ve been meaning to adjust about my results. First, I forgot in my analysis that a correct response should only be noted as correct if the response and the follow-up “why” question are both correct. Thus, instead of a score out of 20, I should have a score out of 10. Second, I wanted to add a comparison to norms. In the graph below, the “Norm” line comes from a scaled (from 13 to 10) version of the results compiled by the Frameworks for Inquiry project. This line corresponds to the scores of 3800 American students from grades 10 to 12, and has meaningfully been connected with such things as Piagetian developmental stages.
I think it is clear that the blue line (my students in the normal testing situation) are pretty close to the norm data, while the post-demonstration results (in red) look to be quite different on this cumulative frequency graph. This should be taken as an indication that linguistic difficulties are an important factor in determining the score of the CTSR.
My grade 11 physics class is a collection of students from a handful of different education systems and backgrounds, so I wanted to use Lawson’s Classroom Test of Scientific Reasoning [CTSR] to get a sense of the students’ reasoning skills. Higher CTSR scores have also been correlated with improved performance, so I wanted to see if the possibility exists for me to target reasoning skills with my instruction.
Unfortunately, like most of the standard assessment tools, the CTSR is a bit wordy. That is intimidating to my students, all of whom speak English as a second, third, or fourth language. To help students understand what the questions area asking, I assembled replica demonstrations of the scenarios described in the CTSR: two balls of clay with equal masses (one pressed into a pancake), two graduated cylinders with different widths demarcated with 1, 2, 3, etc instead of units of volume, and so forth. Since our school has a strict animal experimentation policy, I had to skip 6 of the questions. I also skipped the last four, since they are so wordy even most native English speakers don’t fully read them!
- The Lawson CTSR was applied as specified.
- After 30 minutes, the students were instructed to turn over their answer sheet and use a second answer sheet on the back.
- Students were informed that they would write the test again, this time with some visual aids, but given no feedback about their original performance.
- After this explanation, I said as little as possible; merely demonstrating the apparatus.
- After the second application of the test, students were asked whether they thought they had the same answers the second time. 12 out of 14 claimed they did (in reality, only 6 had exactly the same answers).
During the second iteration, 8 of 14 students had at least one answer that was different. This includes the 7 lowest-scoring students. Of the changes, 27 were changed from wrong to right, and 2 were changed from right to wrong. This impressive record suggests that most changes were done because of an improved understanding of the situation. There is an average increase of 3.375 points per student (out of 20, so 17%), for those with different answers.
The results from this graph show a different classroom dynamic once we begin to account for language difficulties. Misdiagnosing language challenges as conceptual misunderstandings can lead to problems and frustration. If you are using the CTSR, the FCI, or any other baseline instrument, be careful about your audience. And, for those who create standardized tests, especially for students in international settings (*cough* IB *cough*) it is essential that the gist of the problem can be understood without students getting caught up by the language.