Back to top

Fueling Nonprofit Innovation: R&D Vigor Trumps Randomized Control Trial Rigor

Author: Peter York, Stanford Social Innovation Review
Published Date: 16 August 2011

At the Social Impact Exchange conference in New York a few months ago, I heard the leader of a rapidly growing, national youth-serving nonprofit proudly declare that the organization was about to begin a costly randomized control group study to evaluate its programs. The chief executive described this as the “gold standard” for assessing and refining program effectiveness in the nonprofit sector. But should it be? Or—for on-the-ground program designers—should the gold standard be a research and development (R&D) approach to evaluation that pulls apart a program to figure out the nuts and bolts of what works, for whom, and under what conditions?

Most nonprofit leaders believe that a comparison group study that employs control trials is the most valuable tool for evaluating program effectiveness. These rigorous research methods are used to prove whether a whole program (i.e., an intervention that has been completed and/or has crossed some dosage threshold that deems it fully implemented) led to the achievement of greater results for those who participated, versus those who did not. If the participants’ group average on a measurable outcome was “significantly” higher than the non-participant group’s average, then the program is considered successful. But it is important to note that a comparison study has to prove only that the group outcome average of those receiving the whole program experience is statistically higher than the group outcome average for those who did not participate in the program, and that the difference in the average scores wasn’t due to chance.

Statistical significance in no way means that everyone in the intervention group succeeded! In fact, most statistically significant differences in comparison group studies are unremarkable when you look at the average scores of the intervention and control groups. For example, in a research brief on Early Head Start, published on the US Department of Health and Human Services website makes the following claim: “Early Head Start programs produced statistically significant, positive impacts on standardized measures of children’s cognitive and language development. When children were age 3, program children scored 91.4 on the Bayley Mental Development Index, compared with 89.9 for control group…” This finding may benefit funders and policy makers, but what is a program designer supposed to do with these data points? The statistically significant difference can’t be leveraged from a design standpoint—what is a program designer supposed to do with the knowledge that the program participants scored 1.5 points better?

Comparative group designs do not lend themselves to the real-time learning and fast program adaptations demanded by the complex and tumultuous environment in which nonprofits operate today. This type of evaluation is not required to examine why some program participants do not achieve a desired outcome. To continually refine their programs, nonprofit leaders need to know much more, including which members of the group benefited, which did not, why, and the explicit cause-and-effect relationships. And nonprofit leaders must be involved in interpreting the data. They cannot afford to be on the sidelines, waiting for a professional evaluator to collect data, draw conclusions, and write and deliver a report, while programmatic rigor mortis sets in.

In the private sector, R&D helps product and service designers analyze what is and is not working for different customers, understand the various contributing factors, and continually test new ways to serve more people better. What does an excellent R&D function look like in a nonprofit organization? Research begins by going straight to the source—the student, the theater-goer, the homeless person—to get their direct feedback on how the program has impacted their lives. It looks at which program elements worked and for whom, and designers pay attention to both strong and weak performance to decipher how particular program ingredients cause short-term results for specific sub-groups. After preliminary and often-times sophisticated data analysis are completed, program leaders are deeply engaged in interpreting data and spearheading the innovation or re-design process, with an evaluator in a technically supportive role. And the testing process is ongoing.

Though these R&D practices can benefit nonprofits, few practice them. Recently, TCC Group examined the aggregate results of over 2,500 nonprofit organizations across the country and found that only 5 percent of nonprofits are engaging in R&D practices. The study also discovered that organizations that use R&D practices are almost two and a half times more likely to grow at or above the annual rate of inflation, regardless of the size of the organization’s budget. In particular, the following R&D behaviors are uniquely and significantly correlated with organizational sustainability and growth:

• Gathering data directly from program recipients to determine how to improve services
• Determining outcome metrics by listening to, documenting, and sharing actual client success stories and results
• Engaging key leaders and staff in interpreting the client-derived data
• Evaluating a program to figure out what aspects of it work, rather than whether the program as a whole makes an impact
• Bringing program design leaders together to assess and address the resources needed to deliver programs effectively
• Leveraging R&D insights to inform the program implementation team

Compared to rigid social science methods, R&D can help more nonprofits learn, innovate, and reach goals faster—for less money. It is also a more pragmatic way to assess program design replicability and costs, and to develop better business models that support the realistic expansion of high-impact nonprofit programs. To accelerate the scaling of social innovation over the next few years, nonprofits need to rely less on the rigor of academic experimental design and more on the vigor of R&D.

Tags: Tools