On February 25, 2016, my son’s 3rd grade Florida State Assessment (FSA) score report was delivered to his school, (ironically) by express delivery, and I was finally given a copy demonstrating his percentile performance in last year’s assessment. For some reason, never explained, his scores were late reported by the state. He is scheduled to take the 4th grade FSA Writing component on March 1st (5 days after we received last year’s results). True story.
Last Spring, the inaugural administration of the Florida State Assessment (FSA) was marred by computer glitches, cyber attack and validity questions. Round Two begins on February 29, 2016. Has anything changed? Is Florida ready?
Last Spring: Frozen Screens. Service denials. Cyber attacks. Server Crashes. Log-in failures. This year: Low Expectations.
The computer-based FSA had problems from the start. “From the get-go, the FSA test was riddled with technical problems, including a cyber attack from an unknown source and a slow-moving AIR server which left thousands of students with error messages in their language arts and mathematics tests.” (read about it here and here)
The Florida Department of Education did launch an investigation into the server attack that, presumably, had left thousands of students across the state unable to take the new FSA, but 6 months later, the Department still had no answers to who was behind the server attack or why it took place (details here). Is that investigation still ongoing? Are adequate safeguards in place?
In September, State Education Commissioner, Pam Stewart, announced that she would seek liquidated damages from American Institutes for Research, or AIR, the creator of the FSA (damages which are clearly delineated in the State’s executed contract and collection of such is required, “if applicable”, by Florida State Statute). Nevada has successfully recovered $1.3 million in damages from AIR for similar testing disruptions there (details here). Recently, an attorney from the FLDOE refused to comment on the status of this, stating it was “pending litigation,” but we can find no public record of any pending litigation. Is the State still pursuing this?
This year, expectations for FSA computer-based testing are being set low, as reported here.
“The Florida Department of Education, its testing vendor American Institutes for Research, along with districts and schools, have taken several steps to prevent such troubles. Those include expanding bandwidth, upgrading defenses against outside attacks and improving testing software.
Even with such moves, though, the department warned that students still might encounter interruptions beyond their control. And that, said FairTest public education director Bob Schaeffer, could hurt some children.
Imagine the impact when the screen goes blank for a seventh-grader taking a civics test required to get out of middle school, Schaeffer said. “For an emotional adolescent to experience that, it’s a scary situation.”
Yet there’s almost no way to guarantee trouble-free computerized testing on a stage as large as Florida’s, experts said.
So… expect glitches. Lots of them.
Earlier this month, students across Florida, when asked to participate in district “infrastructure testing,” reported wide-spread issues, in multiple counties, with the system: screens freezing, difficulty loading, bugs in the program, unexpected shut downs. It appears there is still “work to be done.” Similar situations in Tennessee, led that state to “pull the plug” on this year’s computer testing (read about it here). What would it take for Florida to pull the plug and return to paper and pencil tests, as recommended by many of Florida’s superintendents?
There are more reasons than just technical issues to question continuing with computer testing. Reports (here and here) from Illinois, New Jersey and Maryland have shown students who took the 2014-15 PARCC exams (similar to Florida’s FSA) via computer tended to score lower than those who took the exams with paper and pencil—a discovery that prompts questions about the validity of the test results and poses potentially big problems for state and district leaders. Why evaluate student proficiency with an assessment that under-represents their abilities?
Were similar results found with the Florida assessments? Did students perform better on paper tests than computer tests? Is the comparison even being made? No one knows because the 2015 FSA Technical Report, due out in January, has yet to be released.
Technical Report not complete
The 2015 FSA Technical Report, usually published by January following a test’s administration, is still (per personal communication with the FLDOE on 2/25/16) “undergoing final reviews and will be available shortly.” In the past, FCAT Technical Reports (past examples can be found here) focused on test validity and reliability, “key concerns for establishing the quality of an achievement test such as the FCAT.” Also in the past psychometric analysis was the major focus of these reports. Without a completed report, is Florida proceeding with this year’s FSA administration in the absence of proof of psychometric validity?
Interestingly, the 2014 Florida Statewide Assessments (FCAT 2.0) Technical Report (released in 12/2014), on page 137, suggested that further studies were needed to verify some implication arguments. “This is especially true,” it read, “for the inference that the state’s accountability program is making a positive impact on student proficiency and school accountability without causing unintended negative consequences.” (Emphasis mine.) I am, especially, eager to learn whether Florida completed these “further studies” because it appears there are LOTS of unintended consequences in the current accountabaloney system.
An Incomplete and Not Independent Validity Study
Questions regarding the validity of the FSA began to be asked last March, when Commissioner Stewart testified before a Senate Education Committee that the FSA had been validated in Utah and she promised to provide the senators with those reports. No such reports were ever delivered. This prompted legislators to pass a bill requiring the department to hire an independent company to verify whether the FSA was a valid tool to assess the academic achievement of Florida’s students.
Let’s take a little pause, here, to review how to determine whether a test is valid or reliable. This article outlines the process (you can substitute “FSA” for “SBAC” and the article would still, mostly, hold true):
“With real scientific educational research, a group of independent researchers, not bought off by any billionaire, would select a large group of representative students, such one thousand 8th graders randomly chosen from an urban school district. These students would be randomly assigned to an experimental group and a control group. A survey would be given to each group to make sure that the groups were matched on important characteristics such as free and reduced lunch status and ELL status. Each group would be given a series of tasks such as completing the 8th grade NAEP Math test, the 8th grade SBAC Math test, the 8th Grade MAP test, the 8th Grade MSP test and/or the 8th Grade Iowa Test of Basic Skills. These would then be compared against comparison measures such as teacher grades in their previous and current year math courses. The actual test questions of every test would be published along with student scores for each test question on each test. Objective analysis and conclusions could then be made about the reliability and validity of various measures using these carefully constructed norming studies. This results and conclusions would be peer reviewed and quite often the entire study could and would be replicated by other independent researchers at other major Universities. “
Suffice it to say, Florida “independent” validity study didn’t do that.
Partial, hardly independent, FSA Validity Study
Florida hired Alpine Testing Solutions to perform the mandated validity study. Alpine partnered with EdCount, a partner of AIR, the test creator. The project team contained previous AIR employees. So much for the independent part…
The Alpine Report was presented to the FLDOE on August 31, 2015, after allowing the DOE to review and make suggestions regarding two earlier drafts of the report (more here). The final report was released to the public on September 1st. The FLDOE announced the study showed the FSA to be “valid,” a claim that was challenged by educators (more here and here). The report recommended against using the results from the computer-based assessments as the sole factor in determining individual consequences, such as whether students should be promoted, retained or remediated, but felt those same scores COULD be used to evaluate teachers and schools. Not surprising, many wondered how a test, not found to be accurate in measuring student achievement, could be used to rank teachers, schools and districts.
While many debated the contents of the Alpine report, what was NOT in the report was equally interesting. Alpine was charged with reviewing the grade 3-10 English Language Arts (ELA) exams, the grades 3-8 Math assessments and the Algebra 1, Algebra 2 and Geometry End of Course (EOC) exams. Because of time constraints, however, the report ONLY evaluated ELA exams for grades 3, 6, and 10, Math exams for grades 4 and 7 and the Algebra 1 EOC. This leaves 11 of the 17 new FSA exams (ELA grades 4, 5, 8, and 9, Math grades 3, 5, 6, and 8, and Algebra 2 and Geometry EOCs) UNEVALUATED and possibly invalid. When will those tests be evaluated? Before the next FSA administration? That seems unlikely, since the next administration begins on Monday (2/29/16).
Also missing from the Alpine Report is any evaluation regarding whether the FSA is valid, fair or reliable for vulnerable populations, such as special needs students, english language learners and other at-risk populations, as outlined by Dr. Gary Thompson here.
“…Due to the limited time frame for developing the FSA, item reviews related to content, cognitive complexity, bias/sensitivity, etc. were not conducted by Florida stakeholders.” (Alpine Testing Solution, Inc. Validity Report P.35)
Per Dr. Thompson, “Neither Utah nor Florida has produced validity documents suggesting that either the SAGE or FSA high stakes academic achievement tests can validly measure achievement in vulnerable student populations, or that the current testing accommodations allowed or banned, are appropriate or fair.”
Without determining fairness for vulnerable sub-populations of students, continuing to use FSA scores to retain, remediate or prevent students from graduating is unconscionable. How can an accountability system be based on an invalid or unfair measurement?
So, in summary:
- Expect computer glitches. Keep your FSA expectations LOW.
- Don’t hold your breath for any liquidated damages from last year’s FSA fiasco, despite being spelled out in the AIR contract and mandated by statute.
- State assessments, in general, are not appropriately validated. Will Florida ever evaluate whether, like the PARCC assessment, FSA scores are worse on computer then pencil and paper?
- The validity of the FSA is, at best, incomplete; the Alpine Study failed to evaluate 11 of 17 new assessments.
- There are no documents suggesting the FSA can validly or fairly measure achievement in vulnerable student populations, yet those students suffer the same consequences as their more advantaged classmates.
- Don’t expect the FSA Technical Report to address the unintended negative consequences of the current system.
- My son’s test scores, delivered just 5 days before this year’s test, will not inform his instruction.
- The Accountabaloney will continue until further notice.
Or, in other words, “Same song, second verse, could get better but it just gets worse.”
FSA testing starts Monday, whether we are ready or not.