Joaquin - wow, I love this, and it is nice to learn from someone who has grappled with USAID - and evaluations - much more than I have. This comments string is still a suboptimal way to discuss, but it’s better than twitter, so:
Your thoughts on how USAID might go about this in your paragraphs 2 and 3 are music to my ears.
Paragraph 4: I didn’t realize that some areas of USAID programming - health, as you mention - are much better than others. I do think that USAID is doomed in terms on real change until it begins to integrate the funding of good interventions and good organizations committed to taking them to scale (and that, of course, includes evaluations). USAID could do so much more to ensure than really good stuff scales to achieve its potential.
Your next point about cash - that it represents a one-time investment that mimics the one-time investment in a project - makes sense, but for the durable impact piece. I simply don’t think we should do anything in development - as opposed to humanitarian aid - unless we can make at least a theoretical case for lasting impact. Cash is a good comparator if you’re only looking at short-term impact, but I still don’t think there’s a persuasive case for lasting impact. The hope in a decent employment program is that you’ve armed some with the skills to continue getting employed and if so, cash can’t match it
I don’t think we disagree at all on the need to measure multiple things - it’s just that in the end, some matter way more than others. Ultimately, we - or at least One Acre fund, if they want to get better at what they do - need to know all the things that you list. However, their mission is to “Get farmer out of extreme poverty,” so the make-or-break metric is profit. Nothing else matters unless farmer income increases - not yield, not anything. We go to great lengths to determine the real mission of an intervention - the eight word mission - statement. We’d argue that the point of employment - job or business - is income, so that would be the make-or-break metric. I don’t think there is any justification for calling something a partial success if impact in terms of the key metric = zero.
I don’t think that cash does harm, and I think it is always better than nothing, especially if getting than nothing involves a person’s investment of time and effort into an ineffectual program. I’m curious, though, what it the programs had shown a little bit of impact? What then? In the case of nutrition, the cash had zero effect on nutrition, but way more overall short term benefit. Dow cash have to answer to the same mission - would this mean that the program “won?” Or do we somehow try to compare apples and oranges, which seems like a mess. the only benchmark plausible here is to compare the nutrition program to another one that did succeed. If you “if you don't have good evidence about existing programs” then your all-out priority should be to get some. In almost any sector, there is somebody somewhere doing a great job.
Finally, i so agree with your overall message and approach, and I’d add one thing - you mention five year projects and I gather that while there are midline measures, there’s not much ongoing iteration. I tell our fellows that you should never go into an RCT unless you already know the answer from your own well-designed systems. It really is inexcusable to take five years to find out that you didn’t even have short-term benefits. While I’ve seen a lot of comments for the evaluator community that organizations that self-measure are always wrong, we’ve seen that well-designed systems can get a valid sense of impact. Granted, it’s often a bit less than what is determine by an outside evaluator, but it’s rarely that meaningful a difference.
Probably more that you asked for, but fun to dive into - thanks.
This is great Kevin! I really appreciate your deep engagement and thoughtful responses. I have already commanded a lot of your time (and I know you're a busy person!) but suffice it to say we basically agree about how you'd judge the success of an intervention on it's own merits. So I'll just share a couple more reflections about the institutional context at USAID in response to your comments.
I couldn't agree more for the need for piloting, regular testing, and iteration before evaluation. If I could criticize myself in my time at DIV, I would say I pushed some of our grantees towards RCTs before they'd really figured out their operational model. Another problem at USAID, however, is that programs are trying to spend money fast and reach as many 'beneficiaries' as possible before they've really figured the most effective/efficient approach to delivering their intervention.
But the main problem, and the goal of this work, is not to promote cash transfers but to push for more rigorous evaluation within the agency. The reason why relates to your reservation about cash as a comparator to typical programs. You make the very sensible point that USAID shouldn't do things that don't at least theoretically improve long-term outcomes for people in developing countries. The problem at USAID is that most program officers mistake the THEORETICAL case for impact for ACTUAL impact. This leads to a pervasive attitude of "why evaluate something we know theoretically works?". I know it sounds crazy, but I can't tell you how many times I was told exactly that by people in USAID missions around the world when we were trying to set benchmarking studies up (I pitched about 20-25 missions).
Even once the results of the nutrition study came in, there was a kind of magical thinking on the part of many USAID staff that somehow the benefits that failed to materialize in the short-term would be improved in the long term. But if your income, health, dietary diversity, or child anthropometrics don't improve within 10 months of the completion of a program that will not provide any services to you in the future, I struggle to see how those outcomes will improve in the long-term, especially given what we know about the importance of the first 1000 days of a child's life (the time period targeted by the intervention). Indeed, if rigorous evidence of long-term impacts is rare in international development, programs that have no short-term impact but somehow achieve long-term impacts would be even rarer!
So we really wanted to fight the 'received wisdom' at the agency that USAID programs are, as a general proposition, effective. There is little to no evidence to suggest they are because USAID does not rigorously evaluate its own programs. And the evidence that USAID does generate is often suspect. Akazi Kanoze, the predecessor program to Huguka Dukore (the employment program from the second study) was "rigorously evaluated". EDC did an internal RCT suggesting that the program increased employment relative to non-program-recipients - an RCT run by the organization themselves. Then the "performance evaluation" that was essentially contemporaneous with the cash benchmarking study suggested that the program was improving employment outcomes.
But the independent and more rigorous evaluation of the cash-benchmarking trial showed it to be ineffective in improving employment or income! USAID does LOTS of 'internal' and 'performance' evaluation, but almost NO independent evaluation. These studies hopefully show why this kind of evaluation is necessary, cash-counterfactual or not!
I share all of this to paint a picture of the just how much the culture and existing practice of generating/using evidence at the agency is stacked against a) rigorous and independent evaluation, and b) using evidence to CHANGE programming.
Thanks again for engaging on this. I hope this reflections are helpful!