Wednesday, June 10, 2020
Problems With New PSAT Part 1 Percentile Inflation
[Part 1: Percentile Inflation is the first of a three-part report on the new PSAT. See Overview,à Part 2: Score Discrepancies, and Part 3: Lowered Benchmark.à The entire report can also beà downloaded or distributed as a PDF.] Part 1: Percentile Inflation A series of changes hasà greatly increased the percentile scoresà that students and educators are seeing on PSAT score reports. College Board has not been transparent aboutà all of the changes and the ways in which they can distort score interpretation. Moving the Goalposts: Aà New, Hypotheticalà Measuring Stick Aà shift in percentile reference groups has, perhaps, caused the most immediate and pervasiveà confusion in interpreting score reports. Many PSAT report recipientsà assume that percentiles are calculated directly from the current pool of test-takers. Surely, aà 55th percentile score for a section should meanà that of current test-takers, 55 out of 100 scored at or below that section score. Thats not the case and has almost never been the case. College Board has long used the previous yearsà test-takers as the reference group for PSAT/NMSQT percentile calculations. Those students provided the measuring stick, so to speak. Since the pool of test-takers evolves slowly, the difference betweenà comparing 2014 students to 2013 students andà comparing 2014 students to each other would not haveà been pronounced. With the 2015 PSAT, College Board has introduced an entirely new measuring stickà à a nationally representative sample [also referred to as the National Representative sample] andà made it the default norm for student score reports. Here is College Boardââ¬â¢s definition: Nationally representative percentiles are derived via a research study sample of U.S. students in the studentââ¬â¢s grade (10th or 11th), weighted to represent all U.S. students in that grade, regardless of whether they typically take the PSAT/NMSQT.â⬠In other words, test-takersà are compared to studentsà who didnââ¬â¢t even take the test and may never take the test. These percentiles are displayed prominently alongside student scores in both online and printed reports. We find that parents and students are using these percentiles as their primary source of information. Unfortunately, these nationally representative percentiles have several problems: they provide a source of percentile inflation they doà not accurately compare students to the pool of students likely to take the SAT or ACT they represent a break from past reporting and mean that theseà figures cannot be compared to any prior data they represent a ââ¬Å"black boxâ⬠ââ¬â it is unclear exactly how the national sample is derived, how accurately it reflects the national pool of students, or when or if it will be modified in the future. College Board often cites transparency as a goal for its programsà and as a justification of the new PSAT and SAT. Nationally-representative percentiles seem far less transparent than traditional test-taker percentiles. The new percentiles areà not based onà college bound students. The new percentiles are not based onà others taking the same exam. The new percentiles areà based on numbers that can only be judgedà viaà a technical reportà such a reportà has yet to be released. National Users: Students Becomeà Users and Study Samples Replace Actual Results An alternate set of percentiles, User: National, is alsoà provided,à but each score in the secondà set is only foundà several clicksà deep in theà online version of the report. In fact, a full student report contains 25 separate percentile scores. The temptation is to view Userà as interchangeable with the traditionalà notion of ââ¬Å"test-takers,â⬠but that would be inaccurate. The PSAT previouslyà presented percentiles based on ââ¬Å"[students] who took the test last year,â⬠but the new PSAT has no ââ¬Å"last year from which to draw. Rather thanà opting to use actual student data from 2015 test-takers, College Board created a new reference group: User group percentiles are derived via a research study sample of U.S. students in the studentââ¬â¢s grade, weighted to represent students in that grade (10th or 11th) who typically take the PSAT/NMSQT.â⬠This procedure is not uncommon, but that does not ensureà that it was done accurately this time. At minimum, it creates another black box for students and educators. Its a remedy that did notà need to exist. Consider, by contrast, that the SAT and ACT are taken on many different dates over sophomore, junior, and senior year of high school. If a student is to get an accurate sense of how she stacks up to other students in herà class, data from her classs testing history must be consolidated. Since percentiles cannot be calculated contemporaneously with score reports, the testing organizations use scores from a prior group of test-takers. College Board uses the previous class year for the SAT, whereas ACT traditionally uses the prior three years. The data consolidation rationale does not exist for PSAT/NMSQT percentiles.à As of October 28, 2015, every student who would ever take the 2015à PSAT/NMSQT had done so. Full results could have been tabulated and used for percentile c alculation and reporting. Instead, College Board electedà to use a sampling method that has not been disclosed and that is subject to the error inherent in any sampling. Nationally Representative Scoresà Result in Percentile Inflation The table below shows how the Nationally Representative percentiles differ from those for User and increase expected percentile scores. ACT uses prior test-takers only, so theà PSATà is unique in this source of inflation. A Percentile by Any Other Name:à College Board Changesà a Definition Aà more fundamental change underlies all of the percentile scores on the new PSAT report. Few people give much thought to the various ways percentiles are defined, because the measure seems so simple to understand. [In this report, the vernacular ââ¬Å"percentilesâ⬠will be used with no attempt to distinguish among percentiles, percentile rank, or cumulative percentages.] In standardized test reporting, the two most common ways of defining percentiles for test-takers vary slightly enough that the distinction often gets overlooked: Definition A: The percentage of students scoring below you. Definition B: The percentage of students scoring at or below your score. Definition Bà produces higher values in almost all cases and never give lower values.à College Board shifted from Definition A to Definition B this year, introducingà an additional source of percentile inflation. Understood in context, there is no negative implication to this inflation ââ¬â the new definition is just as valid and, perhaps,à easier for a layperson to understand. The context, though, is easily lost. There are no red asterisks alerting toà the change, so students and educators are understandably and incorrectly comparing 2015 percentiles to those from previous years. Traditionally, College Board used Definition A and ACT used Definition B. It seemsà fittingà that as the SAT and ACTà grow more similar in content that their respectiveà organizations now agree on Definition B. It is unclear if College Board will be using this definition for all of its exams. Below is an excerpt of percentile tables for the new PSAT; columns have been addedà for Definition A to demonstrate how percentile inflation can be observed. Neither of the percentile definitions provideà full information, because percentilesà do not convey how many people achieved the same score.à Anyà air travelerà has encountered à a variant of this problem whenà Group 1 is called and 200 passengers rise as one. Having a highà score is not as good ifà too many people share your score. Definition A indicatesà what percentage ofà studentsà achieved lower scores, but it cannot conveyà what percentage scored higher. For example, by looking at the table above at the cell for a score of 500 Math under Definition A, we can tell that 50 percent of students scored lower. Without referring to other cells, though, we do not know how many scored higher than 500 (it turns out to be 45 percent).à In the case of Definition B, we would see that a 500 is the 55th percentile and know that 55 percent scored at 500 or below. This tells us that 45 percent scored higher (100 55 = 45). We would not know how many students scored belowà a 500. The definitions do not change the underlying data, but students and educators are only provided a single value on their score reports, and it isà the new, higher value. It can feel like having $60 in your pocket rather than having $50. Itââ¬â¢s a nice feeling until you realize that prices have uniformly gone up by 20%. Across the middle, meatiest part of the score range, the change in definition raises percentiles by 2 6 points. It is plausibleà that College Board moved from Definition A to Definition B in partà to give the feel-good impression of $60 ââ¬â especially since ACT was already handing out the extra bills. Adding Up the Changes The percentile inflation caused byà the new definition and the new reference group are effectively additive. Under 2014sà percentile and reference groupà definitions, a 500 Math score would be presented as 50th percentile. On a 2015 score report, however, a 500 Math score isà the 60th percentile. Percentile inflation is as high as 10 percent over part of the scale. At the edges of the scale, the absolute change isà smaller, but the proportional impact is higher. For example, while 99th percentile is only 2 percentile higher thanà 97th percentile, the cumulative reporting changes from 2014à mean a doubling or tripling of students receiving the higherà figure. Withà 3% of students boastingà 99th percentile scores this year,à there areà important implications for how students, parents, and counselors forecastà National Merit scores, for example. Counselors typicallyà see hundreds of reports, so they have been observing this proliferation of high scores withou t necessarily knowingà why. Percentile Inflation is Distinct from Score Inflation The changes described above all relate toà how percentiles are higher than ones reported in the past. Although scoring inconsistencies also appear to exist this year as a separate concern, percentile inflation in and of itself does not provide evidence that scores have been miscalculated or mis-distributed. Why Did College Board Make These Changes? In the course of replacing the old PSAT with the new PSAT, College Board has drawnà on samples rather than actuals, swapped in new measuring sticks, and redefined how the measuring gets done. Why? It is hard to make a case for how students benefit from these changes. Percentile inflation may shift test planning decisions in unwelcome ways, andà the sampling methodology and expanded comparison pool do little to answer questions about how a students scoresà stack up against those of other college applicants. College Board, though, has multiple motivesà for making these shifts. The organizationà is under intense competitive pressures from ACT and other testing companies in the fight over whose testing products will be chosen to assess students from middle school to high school graduation. College Boardà cannot tolerate a competitive disadvantage just to preserve an old definition. Rebranding itsà ReadiStep product as PSAT 8/9, creatingà a vertical scale that tracks students across all of its PSAT and SAT instruments, and rebadging the PSAT as the PSAT 10 when taken by sophomores in theà spring have all been decisions to expand what College Boardà now dubs The SAT Suite of Assessments. College Boardà has strived toà close any real or perceived competitive deficit, and the shift toà the national sample fits into the organizationsà long-term plans.à States and school districts are increasingly contracting with the organizationà to offer the PSAT or SAT to all of the irà students rather than just a self-selected group of college bound students. These bulk buyers preferà standards that compare their students to all grade-equivalent students. The PSAT 8/9 is taken by far fewer students than the PSAT/NMSQT. Test-taker or user percentiles are more susceptible to change from exam to exam. College Board would ultimately like to offer the PSAT to every student across the country. à Ità is, in essence, setting a benchmark with the goal ofà growing into it. There are statistical reasons, too, why preference was given to aà research study sample. Test makers generally want reporting data such as percentiles calculated prior to the administration of a new form. In hindsight, this preference was a risky decision for the PSAT given scrutiny of the exam by both proponents and critics. Many people are left wondering, Is there something to hide? The reasons behind the decision to change the percentile definition and the default reference group may be valid, but the fact that the changes tend to amplify the percentiles and include an opaqueà leap from test-taker group to aà Nationally Representativeà sample creates a dubious impression. A productive solution would beà to release the actual numbers for test-takers and publish allà research study results. The new SAT debuts on March 5, 2016, and many of its components are beingà built on the same research studies and with the same methods used for theà PSAT. It would seem prudentà to establish credibility with PSAT data nowà rather than play catch-up after final SAT numbers areà released. [Continue toà Part 2: Score Discrepancies]
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment