FROM ISSUE NUMBER 19 ~ SPRING 2014 GO TO TABLE OF CONTENTS

TEXT SIZE A A A

Can Government Replicate Success?

STUART M. BUTLER and DAVID B. MUHLHAUSEN

From time to time, a new idea in public policy will actually work. New York City might lower its murder rate. Wisconsin might move thousands from welfare to work. Indiana might provide low-income residents with reliable health coverage at a fraction of the usual cost. When these successes happen, the rest of the nation rightly takes notice. Southern, mid-size metro areas might look to the crime-fighting innovations of the northeastern metropolises. Dense, coastal states might look to anti-poverty advances in the Midwest.

But the task of mimicking and scaling up programs that work is not so straightforward. Success is never a simple matter of easily traceable cause and effect, and even the people who have achieved a breakthrough often cannot pinpoint exactly what worked and why. Social outcomes have an impossibly complex array of causes, and the circumstances that characterize one place are rarely identical — and are often not even very similar — to those found elsewhere. A seemingly successful preschool program in Chicago may fail in Atlanta, even if it is reproduced virtually identically, because of differences, both large and small, between the two cities.

So while the idea of replicating successful initiatives may seem like the epitome of empirical, social-science-driven public policy, replication itself actually has a fairly poor track record. In fact, the evidence suggests it does not work all that well.

This record does not mean, however, that policymakers should conclude that they cannot ever replicate success and should not try to learn from the achievements of others. Instead, they need to think about those successes, and about their own efforts to solve problems where they are, in experimental and incremental terms. They should see their work as a form of adaptive trial and error: Rather than simply try to mimic what worked elsewhere, they should strive to adapt successful strategies to their own situations.

When it comes to confronting deep and complex social problems, no one should ever expect a silver bullet or imagine that someone somewhere else has found one. But armed with skepticism and humility, policymakers can certainly learn from the experiences of others and can make incremental improvements that matter.

THE SINGLE-INSTANCE FALLACY

It is not always evident that a program has, in fact, succeeded in a meaningful way. It is not especially hard to measure good policy outcomes like higher test scores, fewer shootings, or increasing incomes. Rather, the challenge lies in showing that the policy in question is truly responsible for those improvements and then figuring out exactly how it worked. Policymakers cannot conduct controlled experiments with the ease and precision of laboratory science. Even when there are relatively well-isolated control and treatment groups and an array of metrics, and even when the most advanced evaluation techniques are put to use, it is terribly difficult to figure out which policy inputs lead to which social outputs.

But when it seems an instruction sheet for solving a problem is available, policymakers are naturally quick to take it up while ignoring the challenges of identifying the real causes of success. All too often this leads them to jump to conclusions — at great cost in taxpayer dollars and in wasted effort. A lack of due skepticism, and especially a lack of patience to wait for reliable, replicated results before moving ahead with a major new initiative, add up to a large part of the reason for the difficulty policymakers have had in scaling up success.

This rush to judgment can have serious and long-lasting implications. Leaping from a few local instances in which good outcomes followed certain policies straight to the implementation of those policies on a national level can be an extraordinarily expensive proposition. And once a policy is implemented, political inertia can keep large, costly, ineffective programs in place for decades.

Early-childhood education offers a good example of such pitfalls. Head Start, a federal program that funds preschool initiatives for the poor, was based on a modest number of small-scale, randomized experiments showing positive cognitive outcomes associated with preschool intervention. These limited evaluations helped trigger expenditures of over $200 billion since 1965. Yet the scaled-up national program never underwent a thorough, scientifically rigorous evaluation of its effectiveness until Congress mandated a study in 1998. Even then, the publication of the study's results (documenting the program's effects as measured in children in kindergarten, first grade, and third grade) was delayed for four years after data collection was completed. When finally released, the results were disappointing, with almost all of the few, modest benefits associated with Head Start evaporating by kindergarten. It seems the program had been running for decades without achieving all that much. Worse yet, the scant evidence of success has not stopped Head Start's budget from continuing to swell: The program cost $8 billion last year.

The billions of dollars of expenditures on Head Start have been defended for many years largely on the basis of just two small-scale evaluations — the HighScope Perry Preschool study, which began in the 1960s in Ypsilanti, Michigan, and Carolina Abecedarian in Chapel Hill, North Carolina. James Heckman of the University of Chicago and his co-authors used advanced econometric methods to examine the Perry program, an early-childhood education initiative that primarily targeted African-American children. The researchers found that the program produced $7 to $12 in societal benefits for every dollar invested — the equivalent of a policy grand slam.

If early-childhood education in the inner cities yields such extraordinary benefits, one would think that a few small-scale reproductions of Perry and Abecedarian would turn up similar results. Yet not a single experimental evaluation of the Perry or Abecedarian approaches applied in another setting or on a larger scale has produced anything like the same results. Failure to replicate those successes on a local level should have indicated that the success of Perry and Abecedarian might well have been something of a mirage, or at least that its causes had not been properly understood. Instead, we have an ever-growing federal program with calls for additional early-childhood programs. [For a detailed analysis of Head Start, see "The Dubious Promise of Universal Preschool" by David J. Armor and Sonia Sousa.]

Job-training programs have followed a similar pattern. Once again, when early attempts showed some promise, they quickly became the templates for (and were used as arguments in favor of) heavily funded, large-scale replication efforts. And once again, disappointing results followed after billions of dollars had been spent.

The replication of Center for Employment Training programs is a classic example of this problem of overvaluing modest evidence. Based on the significant positive results of a 1992 evaluation of one center in San Jose, California, the federal government expanded the CET program across the nation. Twelve sites were later evaluated, and none of the results approached anything like those of the San Jose program. Not only did these nearly identical programs fail to increase the employment and earnings of participants, but evaluations showed that young men who participated in CET experienced declines in employment, earnings, and number of months worked.

These experiences with early-childhood education and job-training programs should teach policymakers to hesitate before replicating even seemingly successful efforts on a large scale. It is easy to be fooled by highly technical language or econometric instruments that mask unreliable results, and it is especially tempting for policymakers eager for answers to make too much of a single instance of apparent success in addressing a serious problem.

Policymakers can hardly be blamed for wanting to believe that evidence of success offers them a template, but they must strenuously resist jumping to conclusions. Before making large spending decisions, they need to know that their model for policy "success" has, in fact, been a success.

CAUSAL DENSITY

The difficulties policymakers have had in replicating one-time, small-scale successes is not just a function of impatience that leads eager reformers to jump to conclusions. It also points to a fundamental methodological challenge: the enormously complex social context in which policy successes (and failures) occur.

Jim Manzi, an entrepreneur and author of the 2012 book Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society, has coined the term "causal density" to describe this challenge, which he defines as "the number and complexity of potential causes of the outcome of interest." In other words, the problems being addressed by social programs have multiple causes that are not always fully understood, or perhaps even identified, by researchers, administrators, and policymakers. The many causes and interactions that lead to a measurable outcome in public policy are often impossible to disentangle, making it exceedingly difficult to show that the policy in question is the cause of the outcome or to grasp quite how it worked.

To understand the problem of causal density, it is useful to think about the sort of research done in the sterile environments of science laboratories or manufacturing facilities. The subject of such research, first of all, is relatively simple. Tissues can be reduced to their component cells, which are roughly identical to many other cells existing elsewhere in the world. Every chemical has a unique molecular structure that can be broken down and understood. In the hard sciences, the component parts of an experiment are often replicable. At the same time, such sciences benefit from the fact that it is possible to more or less hold equal all possible effects on the subject of the research. That is, after all, why research is conducted in sterile environments.

Social policy is necessarily very different. First, the subjects of the research are intelligent, adaptable, impossibly complex people. A researcher cannot isolate universal components, like cells or molecules. Second, it is impossible to hold equal everything that people believe, understand, have learned, and are otherwise influenced by during the course of an experiment. People are moved and shaped by effectively infinite influences, many of which could never be isolated or manipulated.

The problem of causal density is especially obvious in the Head Start conundrum: Assessments of the program's effectiveness involve studies of the life outcomes of human beings, all of whom have lives that are inevitably shaped by innumerable diverse causes. As Amy Lowenstein of New York University warns, for instance, "we must be cautious in drawing conclusions about crime effects based on the reductions in crime found in the Perry Preschool study, because there is no way to know if these effects were specific to Ypsilanti, Michigan, where the Perry Preschool was located, or if they would have emerged regardless of where the study took place." What if new police techniques, or an aging population, or less lead in the water significantly contributed to the crime reduction? What if there was a significant cause that we have not considered at all?

Assessments of any education program, in fact, must account for causal density. All such programs function within political and social environments that often can exert far stronger influences on students than can the educational program itself. A well-designed educational program that should be effective, for instance, may "fail" simply because it is unable to undo the damage done by distressed and unsupportive families and communities. And it may be impossible to tell whether those social conditions explain the failure of the program or whether a design weakness is really at fault. Alternatively, a program encountering particularly favorable family or community circumstances may yield a false positive. As former education secretary William Bennett once said, even a Soviet collective farm could look good in places where the soil was perfect and the sunlight plentiful.

As Manzi has illustrated, a classic instance of the challenge of causal density in action is the case of police departments performing mandatory arrests in domestic-violence incidents. During the 1980s, criminologists Lawrence Sherman and Richard Berk analyzed the impact of mandatory arrests for domestic-violence incidents on subsequent domestic-violence incidents in Minneapolis. Compared with less severe police responses, the Minneapolis experiment found that mandatory arrests led to significantly lower rates of domestic violence. Sherman and Berk urged caution in acting on that finding, but police departments across the nation eagerly adopted mandatory-arrest policies.

Once again, it turned out that what worked in one city did not work in other locations. Evaluations by Sherman and others found mixed results in other cities where the Minneapolis policy was implemented. In some cities, including Omaha, Milwaukee, and Charlotte, they actually found that mandatory arrests were linked to long-term increases in domestic violence. It appears that in some neighborhoods, certain abusive spouses, knowing they would automatically be arrested, became more violent, either to discourage a complaint to the police or to make spending a night in jail "worth it." Different cultural norms, economic circumstances, and social and political arrangements make it very hard to draw straight lines between cause and effect in different places.

Any successful replication has to cope with the fact of causal density. Policymakers can never isolate all the cultural causes of the problem they aim to address, and those causes exist in different degrees in different neighborhoods. Moreover, policymakers should not assume they can predict how human beings will respond to new policies. As the domestic-violence example shows, people can respond in surprising ways. Ultimately, successful replication requires that policymakers acknowledge how much they don't know.

THE TROUBLE WITH PILOTS

Even in those cases in which causation between a policy initiative and positive results can be fairly reliably established, however, effective replication remains very difficult thanks to what analysts refer to as the challenge of "implementation fidelity."

It is essential that would-be replicators of a carefully designed program actually understand and follow the theory and principles underpinning the original program; only then can they put the key components correctly into place. If the people trying to replicate the program don't fully understand those elements, or if they perhaps disagree with them, or if local conditions make certain features impossible to implement, then the model that met with success previously is very likely to fail when introduced in a new community.

Implementation fidelity is especially challenging when trying to scale up effective local enterprises into national programs. This is in part because national programs inherently face different circumstances. The larger scale itself means that the expanded program is quite different from the smaller one it seeks to copy. This may be a fundamental flaw in the common approach to evaluating new public policies, in which new policy ideas are tested out in designated "pilot programs," or demonstration projects with the aim of vastly expanding those that seem to work.

For one, demonstration projects typically receive relatively high levels of funding and frequently operate under unusually favorable conditions, benefiting from advantages such as highly trained staff members or local communities that take great pride in having been chosen as a pilot site. But when projects based on demonstrations are rolled out on a wider scale, they generally cannot retain those advantages. Money and skills are inevitably spread thin. Intensity also attenuates: As Hirokazu Yoshikawa of New York University explains, early-childhood education demonstration projects often benefit from a "motivational difference, the desire on the part of many involved to demonstrate positive impacts." The staff of a pilot program, in other words, has the incentive to go the extra mile to get something to work. The staffs of later, larger programs usually lack this impetus. Local administrators who have the funding to implement a scaled-up version aren't fighting in the same way to prove policy success — it has already been established.

The way the public finances these social programs exacerbates these difficulties. When a demonstration project proves successful, the federal government often sets aside a pool of money to fund several replications. The first step to making these funds ready for distribution to new programs is to take what was learned in the demonstration project and turn it into a government "request for proposal," which inevitably falls into the form of a check-the-box grant application. These documents necessarily turn the careful techniques and subtle lessons learned at the demonstration level into rigid bureaucratic requirements. That transformation tends to lead to one of two problems: Either the essential elements are oversimplified, which makes it more likely that the replicated projects will depart from the original in fundamental ways, or the resulting technical checklist is made up of requirements so complex that it is difficult to adapt successfully to new local conditions.

The incentive structure inherent in federal funding also poses a problem. Local grant applicants often seek funding primarily as a means of assuring the flow of funds into their budgets and communities for what they deem to be most useful, rather than out of a commitment to adhere closely to the particular design of the program the grant is meant to fund. Meanwhile, the agency officials who dispense funds are often under pressure from their politically appointed superiors to get money out the door rather than demand faithful compliance with the original design of the demonstration.

The Office of Community Oriented Policing Services program provides a helpful example of how the grant-making process distorts policy replication. COPS was created in 1994 to provide state and local law-enforcement agencies with federal grants for hiring additional police officers who would engage in community policing. At the time, experiments in community policing were widely hailed as innovative and effective strategies to reduce crime. Indeed, much was learned from early test programs intended to increase the interactions between police officers and citizens.

After Congress appropriated funds for COPS programs across the country, politicians were eager to cut ribbons and be photographed handing over checks at police precincts. As a result, grants were dispensed quickly but not carefully. John Hart, former principal deputy director of the COPS program, has admitted that most grant applications were accepted even when the proposed policing activities did not fit the traditionally accepted definition of community policing. So it should not have been a surprise when a 2000 Justice Department study found that COPS grantees too frequently established partnerships with the community that were nominal and temporary.

By 2013, the Government Accountability Office estimated that "less than 20 percent of the applications funded in 2010, 2011, and 2012 contained evidence showing how additional officers would be deployed in support of community policing." Not surprisingly, the evidence suggests that the COPS program was on the whole quite ineffective at reducing crime.

BEING ADAPTABLE

The limitations of policy evaluation pose another major obstacle to any attempt to replicate an apparently successful program. Even if policymakers are intent on waiting for more than a few small examples of success before scaling up a model, and even if they appreciate the complexity of social causation and seek to overcome the limitations of the pilot-program model, they often understandably want to rely on careful evaluations of the effectiveness of policies and programs in order to know what is worth retaining and funding. Just as primary research is needed to formulate models in social policy, rigorous evaluation — bearing in mind the possible pitfalls noted above — is needed to help us better understand what may have actually happened during the implementation of a given model. But even genuinely rigorous evaluations of a model can be only so helpful in generating key information to guide replication.

In practice, replicating programs based on evaluation studies is typically a "vertical" process. The information flows up the policymaking chain of command from the location where it is collected. It is then used to design a policy template, which is then sent back down the chain as design criteria for other locations hosting new programs that aim to replicate the original one. The result, says New York University's Yoshikawa in the context of early-childhood education programs, is that mandated replication "consists of large-scale spending on program approaches that are dictated by government policy."

This approach risks encouraging the wrong attitude toward what policy experiments have to teach us. Rather than templates for expensive, full-scale "production line" program expansions, policy experiments should be understood as limited, local attempts to solve problems, which should be built upon with further careful experimentation. Apparent policy successes should not be copied; they should be adapted to new circumstances. Rather than looking for silver bullets, policymakers should adopt an empirical, experimental disposition.

Such an adaptable, experimental approach would involve more of a "horizontal" process than the usual vertical one. On an operational level, such a horizontal process would rely on what one might call "perception" information to guide adaptations of the original model — that is, it would rely on the sense of people on the ground to determine what modifications their particular circumstances require. Rigorous evaluations should still follow later to investigate the results of those adaptations. But the adaptive process itself cannot rely on such evaluations — to have any chance of working, the new program must take into account the many subjective and local factors that contribute to a sense of success and that should guide replication. The design of new versions of an original model is less likely to be successful if based only on the seemingly objective criteria suggested by academic evaluation techniques.

Perception information is valuable because it relies on the observations and experiences of those directly involved. Identifying good teachers in a school is an example of a process best guided by perception information. When a parent whose child attends a new school asks other parents who is a good and effective teacher, the parent obtains a range of more subtle and customized information than he could obtain by relying solely on test scores and pass rates. Information on teaching styles, the degree to which a teacher encourages children with challenges, and the general tone of the classroom are all forms of information that can be critical to "quality" — and which may differ widely in importance from one child to another. This kind of information is gleaned much more efficiently from perception information than from standardized data. And when putting a new policy initiative into effect at the local level, such information can make the difference between success and failure.

Not all useful information can be measured and charted. Even the best researchers necessarily give only a partial picture of whether and why a program works or fails. Policymakers need to invest less faith in researchers' pronouncements and make space for the vast, messy, evolutionary process of building upon past successes. Such a process will lack the stark confidence of more technocratic approaches, but it will ultimately make for better outcomes.

EVOLUTIONARY REPLICATION

The limitations of our traditional model of assessing and scaling up policy initiatives, combined with the importance of letting local judgments guide the implementation of new programs, point to a different way of thinking about replicating success. This alternative approach, decentralized and rooted in perception information, can be analogized to biological evolution. In both cases, the original model is adapted and modified in order to enhance the probability that its essential features will be reproduced in a variety of circumstances. As in evolution, long-term success under this approach to policy replication is achieved through continuous adaptation.

One advantage of evolutionary replication is that, because it never falsely extrapolates general truths from experimental results (which always arise from a specific context), it avoids making sweeping changes on the basis of half-truths or falsehoods. Rather than over-extrapolate from partial knowledge, it runs experiment after experiment, incorporating new insights as they are acquired. This evolutionary, trial-and-error style underpins the market approach to long-term economic success and explains its superiority to centralized, planned economic systems. The same logic can help policymakers learn from one another.

Therefore, rather than try the same policy again and again, policymakers should, like entrepreneurs, constantly adapt a general idea to local circumstances. Local adopters of a successful model need to be allowed to explore refinements of the original design, or to emphasize features that they consider more important than the original designers believed. To be sure, such flexibility can lead to poorer results and to failure, but even these experiences can lead to valuable lessons. Overall, increased local flexibility is likely to lead to a variety of improvements. And the costs of small, local failures will be far smaller than those involved in persisting for decades with national programs that cost billions of dollars and achieve scant results.

This evolutionary approach to policy replication requires a certain kind of regulatory and financial environment, and there are a few broad but concrete steps policymakers can take to allow for more adaptive experimentation.

First, they must be open to policy waivers of a particular sort. Administrative waivers have rightly earned a bad name among conservatives of late. They circumvent the steady and predictable rule of law, while the benefits of such lawlessness often redound to the cronies of those in power. Conservatives, however, shouldn't condemn waivers out of hand. The problem is not with the waivers but with their ad hoc deployment. Rather than set up centralized, standardized programs and then exempt the powerful and well connected, government should set up decentralized programs to begin with.

Obamacare provides a good example of what not to do on this front. It is a program designed to look like it encourages innovation and variety by letting local authorities experiment, but it is, in truth, highly centralized and prescriptive. The law's health-insurance exchange provisions, for instance, give states some freedom to administer the exchanges themselves and experiment with means of achieving the program's requirements, but the requirements are so tightly defined that this freedom amounts in most cases to little more than the states acting as agents of the federal government. Many states understandably opted not to take on such roles.

The designers of Obamacare generally did not envision an experimental, adaptive policy-replication process but rather a "silver bullet" process in which successful government-supported demonstrations become templates and are then nationalized through a combination of sticks and carrots. For instance, the law seeks to create so-called accountable-care organizations through a mixture of regulatory and financial incentives. Seen by some advocates as a powerful tool of efficiency, ACOs bring together health-care providers to serve groups of patients with payments linked to certain quality metrics. That is surely not a bad idea, and many ACOs are associated with improvements in quality or reductions in cost. It should be no surprise, however, that replicating such successes using a highly prescriptive and centralized process has proved difficult; the early effects have been mixed at best, and the growth of new ACOs has begun to slow down.

Rather than designing a program that first centralizes power and then grants some states and localities waivers at the discretion of federal officials, policymakers should pursue an approach that is designed from the start to encourage diverse approaches and experimentation. Where possible, they should dictate ends, not means, and let state and local officials work up from their particular circumstances to propose their own means and techniques. Rather than getting waivers from particular narrow requirements, for instance, states should be able to obtain waivers from some entire federal statutes in order to better achieve the goals of those statutes. Through such legislative waivers, states would be able to experiment with virtually any approach to, say, extending health coverage that complied with broad national goals and incorporated patient protections.

In addition to regulatory wiggle-room, evolutionary replication also requires certain financial incentives. The logic behind the traditional pilot-program approach says that the most cost-efficient way of financing a new program is to base it on what has worked in earlier models. But as we have seen, there are reasons why trying to carefully replicate even the most successful models is very challenging. It would be better to shift the focus from financing specific design inputs to rewarding the achievement of goals. This is a key premise of evolutionary replication: Those aspiring to replicate program success should focus on the goals to be achieved and the values to be enshrined, rather than on the precise original design. The original model is best seen merely as a prototype incorporating certain purposes and values.

An example of this key distinction in practice was the reform of state welfare programs across the nation that led to the 1996 federal reforms. Rather than thinking of welfare as a detailed set of carefully designed programs and benefits that had to be adhered to in every state throughout the nation, analysts and reformers began instead to focus on the purposes, incentives, and values that should be enshrined in a welfare system. One key element was the centrality of work or work experience as a reciprocal moral obligation for benefits received. Another was the goal of boosting people into independence rather than enabling long-term dependence — the idea that welfare should be temporary and not a way of life. To the reformers, the goals and values had to be central. The general design of benefits and incentives of the welfare system had to be aligned with those goals and values, even though they could differ around the country.

The result was a wave of state-level reforms that successfully achieved those goals. A 2001 Manhattan Institute analysis of welfare reform by June O'Neill and M. Anne Hill found that the passage of welfare reform in 1996 accounted for 50% of the dramatic drop in welfare caseloads within four years and more than 60% of the rise in employment by single mothers. Regrettably, the Obama administration now seems intent on weakening the work requirements that drove the reform.

Welfare reform was financed through a shared-savings approach, which focused on meeting set goals rather than identically replicating previously successful models. If a state could reach the goals more efficiently, it could keep the savings from previously projected federal welfare spending in the state.

Another way to finance a program and keep it innovative, adaptive, and decentralized is to shift the locus of financial control from the provider of services to the recipient of those services — who is thus transformed from a mere beneficiary into a consumer. A consumer has a powerful incentive to reward success, much more so than a bureaucrat. An example of this approach is vouchers in education, both in their pure form and in such hybrid forms as charter schools (where parents' choices determine which schools receive funds).

Putting the consumer of education, health care, and other publicly funded social services in control of financing in this way encourages successful evolutionary replication for two related reasons. One is that it spurs and rewards those adaptations that best satisfy the needs of beneficiaries in particular local circumstances. The other is that identifying and rewarding "success" in this way does not require a detailed set of criteria that may or may not actually measure genuine value to the beneficiary. Instead it is based on the perceptions of users of the service — generally a far more dependable measure in practice than boxes checked by the providers of the service.

TRIAL AND ERROR

The past several decades have provided us with a great deal of evidence about the possibility of effective policy innovation. There are some great success stories — in welfare, in law enforcement, in education — but there are also many examples of failure and disappointment. The pattern of these successes and (more often) failures should itself teach us a lesson about the limits of the technocratic approach to public-policy innovation.

The technocratic approach sees policy experiments as testing a concept that, once proven, can be broadly applied to solve a social problem. Adherents of this view are constantly searching for the recipe for the perfect program that will be as useful in Scarsdale as in San Antonio. But the world is too complicated for that, and the technocratic approach can work only by ignoring that complexity and pushing away exactly the kind of local, particular knowledge that might enable the adaptation of effective ideas to new circumstances.

The inadequacies of that approach point the way to an alternative: an evolutionary approach to policy experimentation that has lower expectations but therefore greater potential. By recognizing that only incremental steps are possible, and by orienting policymakers toward making the most of such gradual advances in knowledge, it can allow wise general ideas to take different forms in different places — and can allow us to learn from failure rather than ignore it. By accepting the fact that once-and-for-all solutions to daunting social problems are beyond us, we can instead foster an adaptive, evolutionary, decentralized system of policy experimentation and innovation. But rigorous evaluation is still needed for us to learn all we can from experimentation.

To embrace such a trial-and-error approach is to acknowledge that many of the ideas we try will ultimately fail. But these errors will be smaller and less costly than the ones we make now when we jump far too quickly from a small local success to a vast national program. A dose of humility and an appreciation for the immense diversity of the human experience would lead us toward an approach that learns from programs that have the right goals and are realistic about human vices and virtues, but leaves the details to those closest to the action.

In the evolution of policy, just as in the evolution of species, replication is inseparable from adaptation. The combination of the two takes time, experimentation, and failures. It can be awfully frustrating. But this approach also stands the best chance of yielding genuine and enduring successes, well adapted to the world in which we all must live.

Stuart M. Butler directs the Center for Policy Innovation a the Heritage Foundation. David B. Muhlhausen is a research fellow in the Heritage Foundation's Center for Data Analysis and author of Do Federal Social Programs Work? (Praeger, 2013).