In the 1970s, researchers conducted a study that pitted a moral incentive against an economic incentive.... They wanted to learn about the motivation behind blood donations. Their discovery: when people were given a small stipend for donating blood rather than simply being praised for their altruism, they tend to donate less blood. The stipend turned a noble act of charity into a painful way to make a few dollars, and it wasn't worth it.
What if the blood donors had been offered an incentive of $50, or $500, or $5,000? Surely the number of donors would have changed dramatically.
But something else would have changed dramatically as well, for every incentive has its dark side. If a pint of blood were suddenly worth $5,000, you can be sure that plenty of people would take note. They might literally steal blood at knifepoint. They might pass off pig blood as their own. They might circumvent donation limits by using fake IDs. Whatever the incentive, whatever the situation, dishonest people will try to gain an advantage by whatever means necessary.
Or, as W.C. Fields once said: a thing worth having is a thing worth cheating for.
For every clever person who goes to the trouble of creating an incentive scheme, there is an army of people, clever and otherwise, who will inevitably spend even more time trying to beat it. Cheating may or may not be human nature, but it is certainly a prominent feature in just about every human endeavor. Cheating is a primordial economic act: getting more for less. So it isn't just the boldface names - inside-trading CEOs and pill-popping ballplayers and perk-abusing politicians - who cheat. It is the waitress who pockets her tips instead of pooling them. It is the Wal-Mart payroll manager who goes into the computer and shaves his employees' hours to make his own performance look better. It is the the third grader who, worried about not making it to the fourth grade, copies test answers from the kid sitting next to him.
Some cheating leaves barely a shadow of evidence. In other cases, the evidence is massive. Consider what happened one spring evening at midnight in 1987: seven million American children suddenly disappeared. The worst kidnapping wave in history? Hardly. It was the night of April 15, and the Internal Revenue Service had just changed a rule. Instead of merely listing the name of each dependent child, tax filers were now required to provide a Social Security number. Suddenly, seven million children - children who had existed only as phantom exemptions on the previous year's 1040 forms - vanished, representing about one in ten of all dependent children in the United States.
The incentive for those cheating taxpayers was quite clear. The same for the waitress, the payroll manager, and the third grader. But what about that third grader's teacher? Might she have an incentive to cheat? And if so, how would she do it?
Imagine now that... you are running the Chicago Public Schools, a system that educates 400,000 students each year.
The most volatile current debate among American school administrators, teachers, parents, and students concerns "high-stakes" testing. The stakes are considered high because instead of simply testing students to measure their progress, schools are increasingly held accountable for the results.
The federal government mandated high-stakes testing as part of the No Child Left Behind law, signed by President Bush in 2002. But even before that law, most states gave annual standardized tests to students in elementary and secondary school. Twenty states rewarded individual schools for good test scores or dramatic improvement; thirty-two states sanctioned the schools that didn't do well.
The Chicago Public School system embraced high-stakes testing in 1996. Under the new policy, a school with low reading scores would be placed on probation and face the threat of being shut down, its staff to be dismissed or reassigned. The CPS also did away with what is known as social promotion. In the past, only a dramatically inept or difficult student was held back a grade. Now, in order to be promoted, every student in third, sixth, and eighth grade had to manage a minimum score on the standardized, multiple-choice exam known as the Iowa Test of Basic Skills.
Advocates of high-stakes testing argue that it raises the standards of learning and gives students more incentive to study. Also, if the test prevents poor students from advancing without merit, they won't clog up the higher grades and slow down good students. Opponents, meanwhile, worry that certain students will be unfairly penalized if they don't happen to test well, and that teachers may concentrate on the test topics at the exclusion of more important lessons.
Schoolchildren, of course, have had incentive to cheat for as long as there have been tests. But high-stakes testing has so radically changed the incentives for teachers that they too now have added reason to cheat. With high-stakes testing, a teacher whose students test poorly can be censured or passed over for a raise or promotion. If the entire school does poorly, federal funding can be withheld; if the school is put on probation, the teacher stands to be fired. High-stakes testing also presents teachers with some positive incentives. If her students do well enough, she might find herself praised, promoted, and even richer: the state of California at one point introduced bonuses of $25,000 for teachers who produced big test-score gains.
And if a teacher were to survey this newly incentivized landscape and consider somehow inflating her students' scores, she just might be persuaded by one final incentive: teacher cheating is rarely looked for, hardly ever detected, and just about never punished.
How might a teacher go about cheating? There are any number of possibilities, from brazen to subtle. A fifth-grade student in Oakland recently came home from school and gaily told her mother that her super-nice teacher had written the answers to the state exam right there on the chalkboard. Such instances are certainly rare, for placing your fate in the hands of thirty prepubescent witnesses doesn't seem like a risk that even the worst teachers would take. (The Oakland teacher was duly fired.) There are more nuanced ways to inflate students' scores. A teacher can simply give students extra time to complete the test. If she obtains a copy of the exam early - that is, illegitimately - she can prepare them for specific questions. More broadly, she can "teach to the test," basing her lesson plans on questions from past years' exams, which isn't considered cheating but may well violate the spirit of the test. Since these tests all have multiple-choice answers, with no penalty for wrong guesses, a teacher might instruct her students to randomly fill in every blank as the clock is winding down, perhaps inserting a long string of Bs or an alternating pattern of Bs and Cs. She might even fill in the blanks for them after they've left the room.
But if a teacher really wanted to cheat - and make it worth her while - she might collect her students' answer sheets and, in the hour or so before turning them in to be read by an electronic scanner, erase the wrong answers and fill in correct ones. (And you always thought that no. 2 pencil was for the children to change their answers.) If this kind of teacher cheating is truly going on, how might it be detected?
To catch a cheater, it helps to think like one. If you were willing to erase your students' wrong answers and fill in correct ones, you probably wouldn't want to change too many wrong answers. That would clearly be a tip-off. You probably wouldn't even want to change answers on every student's test - another tip-off. Nor, in all likelihood, would you have enough time, because the answer sheets have to be turned in soon after the test is over. So what you might do is select a string of eight to ten consecutive questions and fill in the correct answers for, say, one-half of two-thirds of your students. You could easily memorize a short pattern of correct answers, and it would be a lot faster to erase and change that pattern than to go through each student's answer sheet individually. You might even think to focus your activity toward the end of the test, where the questions tend to be harder than the earlier questions. In that way, you'd most likely substitute correct answers for wrong ones.
If economics is a science primarily concerned with incentives, it is also - fortunately - a science with statistical tools to measure how people respond to those incentives. All you need are some data.
In this case, the Chicago Public School system obliged. It made available a database of the test answers for every CPS student from third grade through seventh grade from 1993 to 2000. This amounts to roughly 30,000 students per grade per year, more than 700,000 sets of test answers, and nearly 100 million individual answers. The data, organized by classroom, included each student's question-by-question answer strings for reading and math tests. (The actual paper answer sheets were not included; they were habitually shredded soon after a test.) The data also included some information about each teacher and demographic information for every student, as well as his or her past and future test scores - which would prove a key element in detecting the teacher cheating.
Now it was time to construct an algorithm that could tease some conclusions from this mass of data. What might a cheating teacher's classroom look like?
The first thing to search for would be unusual answer patterns in a given classroom: blocks of identical answers, for instance, especially among the harder questions. If ten very bright students (as indicated by past and future test scores) gave correct answers to the exam's first five questions (typically the easiest ones), such an identical block shouldn't be considered suspicious. But if ten poor students gave correct answers to the last five questions on the exam (the hardest ones), that's worth looking into. Another red flag would be a strange pattern within any one student's exam - such as getting the hard questions right while missing the easy ones - especially when measured against the thousands of students in other classrooms who scored similarly on the same test. Furthermore, the algorithm would seek out a classroom full of students who performed far better than their past scores would have predicted and who then went on to score significantly lower the following year. A dramatic one-year spike in test scores might initially be attributed to a good teacher; but with a dramatic fall to follow, there's a strong likelihood that the spike was brought about by artificial means.
Consider now the answer strings from the students in two sixth-grade Chicago classrooms who took the identical math test. Each horizontal row one student's answers. The letter a, b, c, or d indicates a correct answer; a number indicates a wrong answer, with 1 corresponding to a, 2 corresponding to b, and so on. A zero represents an answer that was left blank. One of these classrooms almost certainly had a cheating teacher and the other did not. Try to tell the difference - although be forewarned that it's not easy with the naked eye.
If you guessed that classroom A was the cheating classroom, congratulations. Here again are the answer strings from classroom A, now reordered by a computer that has been asked to apply the cheating algorithm and seek out suspicious patterns.
Take a look at the answers in bold. Did fifteen out of twenty-two students somehow manage to reel off the same six consecutive correct answers (the d-a-d-b-c-b string) all by themselves?
There are at least four reasons this is unlikely. One: those questions, coming near the end of the test, were harder than the earlier questions. Two: these were mainly subpar students to begin with, few of whom got six consecutive right answers elsewhere on the test, making it all the more unlikely they would get right the same six hard questions. Three: up to this point in the test, the fifteen students' answers were virtually uncorrelated. Four: three of the students (numbers 1, 9, and 12) left more than one answer blank before the suspicious string and then ended the test with another string of blanks. This suggests that a long, unbroken string of blank answers was broken not by the student but by the teacher.
There is another oddity about the suspicious answer string. On nine of the fifteen tests, the six correct answers are preceded by another identical string, 3-a-1-2, which includes three of four incorrect answers. And on all fifteen tests, the six correct answers are followed by the same incorrect answer, a 4. Why on earth would a cheating teacher go to the trouble of erasing a student's test sheet and then fill in the wrong answer?
Perhaps she is merely being strategic. In case she is caught and hauled into the principal's office, she could point to the wrong answers as proof that she didn't cheat. Or perhaps - and this is a less charitable but just as likely answer - she doesn't know the right answers herself. (With standardized tests, the teacher is typically not given an answer key.) If this is the case, then we have a pretty good clue as to why her students are in need of inflated grades in the first place: they have a bad teacher.
Another indication of teacher cheating in classroom A is the class's overall performance. As sixth graders who were taking the test in the eighth month of the academic year, these students needed to achieve an average score of 6.8 to be considered up to national standards. (Fifth graders taking the test in the eighth month of the year needed to score 5.8, seventh graders 7.8, and so on.) The students in classroom A averaged 5.8 on their sixth grade tests, which is a full grade level below where they should be. So plainly these are poor students. A year earlier, however, these students did even worse, averaging just 4.1 on their fifth-grade tests. Instead of improving by one full point between fifth and sixth grade, as would be expected, they improved by 1.7 points, nearly two grades' worth. But this miraculous improvement was short-lived. When these sixth-grade students reached seventh grade, they averaged 5.5 - more than two grade levels below standard and even worse than they did in sixth grade. Consider the erratic year-to-year scores of three particular students from classroom A:
The three-year scores from classroom B, meanwhile, are also poor but at least indicate an honest effort: 4.2, 5.1, 6.0. So an entire roomful of children in classroom A suddenly got very smart one year and very dim the next, or more likley, their sixth grade teacher worked some magic with her pencil.
There are two noteworthy points to be made about the children in classroom A, tangential to the cheating itself. The first is that they are obviously in poor academic shape, which makes them the very children whom high-stakes testing is promoted as helping the most. The second point is that these students (and their parents) would be in for a terrible shock once they reached the seventh grade. All they knew was that they had been successfully promoted due to their test scores. (No child left behind, indeed.) They weren't the ones who artificially jacked up their scores; they probably expected to do great in the seventh grade - and then they failed miserably. This may be the cruelest twist yet in high-stakes testing. A cheating teacher may tell herself that she is helping her students, but the fact is that she would appear far more concerned with helping herself.
An analysis of the entire Chicago data reveals evidence of teacher cheating in more than two hundred classrooms per year, roughly 5 percent of the total. This is a conservative estimate, since the algorithm was able to identify only the most egregious form of cheating - in which teachers systematically changed students' answers - and not the many subtler ways a teacher might cheat. In a recent study among North Carolina schoolteachers, some 35 percent of the respondents said they had witnessed their colleagues cheating in some fashion, whether by giving students extra time, suggesting answers, or manually changing students' answers.
What are the characteristics of a cheating teacher? The Chicago data shows that male and female teachers are equally prone to cheating. A cheating teacher tends to be younger and less qualified than average. She is also more likely to cheat after her incentives change. Because the Chicago data ran from 1993 to 2000, it bracketed the introduction of high-stakes testing in 1996. Sure enough, there was a pronounced spike in cheating in 1996. Nor was the cheating random. It was the teachers in the lowest-scoring classrooms who were most likely to cheat. It should also be noted that the $25,000 bonus for California teachers was eventually revoked, in part because of suspicions that too much of the money was going to cheaters.
Not every result of the Chicago cheating analysis was so dour. In addition to detecting cheaters, the algorithm could also identify the best teachers in the school system. A good teacher's impact was nearly as distinctive as a cheater's. Instead of getting random answers correct, her students would show real improvement on the easier types of questions they had previously missed, an indication of actual learning. And a good teacher's students carried over all their gains into the next grade.
Most academic analysis of this sort tend to languish, unread, on a dusty library shelf. But in early 2002, the new CEO of the Chicago Public Schools, Arne Duncan, contacted the study's authors. He didn't want to protest or hush up their findings. Rather, he wanted to make sure that the teachers identified by the algorithm as cheaters were truly cheating - and then do something about it.
Duncan was an unlikely candidate to hold such a powerful job. He was only thirty-six when appointed, a one time academic all-American at Harvard who later played pro basketball in Australia. He had spent just three years with the CPS - and never in a job important enough to have his own secretary - before becoming its CEO. It didn't hurt that Duncan had grown up in Chicago. His father taught psychology at the University of Chicago; his mother ran an afterschool program for forty years, without pay, in a poor neighborhood. When Duncan was a boy, his afterschool playmates were the underprivileged kids his mother cared for. So when he took over the public schools, his allegiance lay more with schoolchildren and their families than with teachers and their union.
The best way to get rid of cheating teachers, Duncan had decided, was to re-administer the standardized exam. He only had the resources to retest 120 classrooms, however, so he asked the creators of the cheating algorithm to help choose which classrooms to test.
How could those 120 retests be used most effectively? It might have seemed sensible to retest only the classrooms that likely had a cheating teacher. But even if their retest scores were lower, the teachers could argue that the students did worse merely because they were told that the scores wouldn't count in their official record - which, in fact, all retested students would be told. To make the retest results convincing, some non-cheaters were needed as a control group. The best control group? The classrooms shown by the algorithm to have the best teachers, in which big gains were thought to have been legitimately attained. If those classrooms held their gains while the classrooms with a suspected cheater lost ground, the cheating teachers could hardly argue that their students did worse only because the scores wouldn't count.
So a blend was settled upon. More than half of the 120 retested classrooms were those suspected of having a cheating teacher. The remainder were divided between the supposedly excellent teachers (high scores but no suspicious answer patterns) and, as a further control, classrooms with mediocre scores and no suspicious answers.
The retest was given a few weeks after the original exam. The children were not told the reason for the retest. Neither were the teachers. But they may have gotten the idea when it was announced that CPS officials, not the teachers, would administer the test. The teachers were asked to stay in the classroom with their students, but they would not be allowed to even touch the answer sheets.
The results were as compelling as the cheating algorithm had predicted. In the classrooms chosen as controls, where no cheating was suspected, scores stayed about the same or even rose. In contrast, the students with the teachers identified as cheaters scored far worse, by an average of more than a full grade level.
As a result, the Chicago Public School system began to fire cheating teachers. The evidence was only strong enough to get rid of a dozen of them, but the many other cheaters had been duly warned. The final outcome of the Chicago study is further testament to the power of incentives: the following year, cheating by teachers fell more than 30 percent.
You might think that the sophistication of teachers who cheat would increase along with the level of schooling. But an exam given at the University of Georgia in the fall of 2001 disputes that idea. The course was called Coaching Principles and Strategies of Basketball, and the final grade was based on a single exam that had twenty questions. Among the questions:
How many halves are in a college basketball game?
How many points does a 3-pt field goal account for in a basketball game?
What is the name of the exam which all high school seniors in the state of Georgia must pass?
a. Eye Exam
In your opinion, who is the best Division I assistant coach in the country?
a. Ron Jirsa
If you are stumped by the final question, it might help to know that Coaching Principles was taught by Jim Harrick Jr., an assistant coach with the university's basketball team. It might also help to know that his father, Jim Harrick Sr., was the head basketball coach. Not surprisingly, Coaching Principles was a favorite course among players on the Harricks' team. Every student in the class received an A. Not long afterward, both Harricks were relieved of their coaching duties.