Evaluation

Reliable evidence of impact

Donor agencies face serious criticism because there are too few convincing cases of developmental success. It would be unfair to blame a single field of policymaking for all the world’s ills without taking into account domestic governance, trade, security or technology issues, of course. Nonetheless, decision makers on development matters must face the fact that, so far, they have not looked at the actual impact of their efforts in a sufficiently systematic and critical fashion.

[ By Jörg Faust ]

Jörg Faust

23.12.2008

Development policy has great aspirations. Core aims include contributing to reducing poverty, promoting democracy and safeguarding peace. Evaluation is about checking whether such aims are met. Assessment of policy results basically serves three purposes:
- learning in the sense of understanding what causes what effects, which is useful to everyone interested in improving development projects,
- control in the sense of giving account to parliaments, taxpayers and the public in general, and finally
- legitimacy, because emphases on impact and transparency boost any institution’s credibility.

The micro-macro paradox

An important strand of impact analysis is macro-quantitative analysis, which relies on econometric country comparisons (see box). So far, however, the results have been sobering. They show that, as of the beginning of this decade, higher aid expenditures had had no statistically robust impact on growth or poverty reduction.

In contrast, numerous assessment reports from bi- and multilateral institutions purport that the results of most measures are satisfying. Obviously, such success at the micro-level has not gone hand in hand with desired societal change overall. Hence there is talk of a micro-macro paradox.

How does one explain this contradiction? One answer is that the sum of indirect and unintended effects of individual projects can rule out desired effects at the societal level. Two channels, provoking such unintended effects are well understood:
- If a country’s economy depends much on ODA, then the influx of foreign exchange will lead to that country’s currency appreciation. Accordingly imports become cheaper and exports more expensive. The country’s global competitiveness suffers accordingly.
- A high dependency on ODA has also a negative impact on a country’s governance. Apparently, ODA partially frees up public funds that can then feed clientelist structures. Comparative analyses, moreover, back the assumption that negative impacts on governance are compounded if donor agencies act in an uncoordinated manner. They thus burden developing countries with substantial coordination and administrative costs, and entice qualified staff to leave the civil service and work for donor agencies instead.

Another explanation of the micro-macro mystery concerns the quality of assessment. In this perspective, there is a lack of reliable empirical evidence for the success stories agencies boast of. Though the vast majority of development initiatives are evaluated, many institutional/organisational obstacles persist – and so do problems related to methods and policy goals.

The following points belong in the first category of criticism:
- Too often, assessment has not been well integrated into a given project. Evaluation must be planned from the very beginning in order for results to have any bearing on implementation.
- Donors have tended to conduct assessments on their own, thus hampering joint learning with the local partners.
- Evaluators – whether freelance consultants or staff-employed experts – too often depend on the development agencies they work for, and therefore have regularly been unable to fully rise to their control function.

Examples of methodological and policy-related challenges include the following points:
- Conventional assessment methodology has put too much emphasis on inputs and outputs, without taking account of societal impacts sufficiently.
- Initial data collection is often poor. In many cases, there is no baseline study in the beginning, though such a study would allow for methodologically stringent comparisons later.
- So far, evaluations have not sufficiently worked on a counter-factual basis. However, the real impact of any single measure can only be assessed with comparative data that shows what would have happened had it not been taken. Simple before-and-after comparisons often lead to false conclusions because there is no way to check the impact of other influencing factors.
Counter-factual tests

At the international level, there is a gradual trend towards more rigorous methods of impact assessment. Such methods are based on statistical, counter-factual procedures, which serve to prove causality between interventions and possible effects. Randomised experiments are increasingly considered to be the methodologically most convincing approach. To implement such an experiment, intervention and control groups are randomly defined at the very beginning of the project (see box). The planned intervention is only implemented in the first set, not in the latter. Randomised experiments have been used for assessing innovative medicines as well as social-policy measures in OECD nations.

At the beginning and at the end of such an experiment, the relevant data is collected in each group, which allows for meaningful before-and-after comparisons as well as comparisons between the groups. If the groups are randomly selected and the samples are large enough, the average differences in the groups’ development can be traced back to the causes. For assessments to be precise, it is important to prevent interaction between the groups as best one can.

In spite of their potential, randomised experiments are often criticised. Some argue that, for normative reasons, it is unacceptable to prefer test groups, that will enjoy the benefits of an intervention, over disadvantaged control groups. As ODA resources normally do not suffice to reach everyone in need, however, it is justifiable to make such distinctions at random in order to better understand what works well. In addition, such experiments make particular sense wherever a programme is to be expanded at a later stage. In such cases, evaluation results of the first project phase can be fruitfully used in later project phases.

Another point of criticism is that randomised experiments do not consider context-specific issues. This is indeed so, unless experiments are embedded in a qualitative contextual analysis. In principle, randomised experiments are superior to qualitative methods (such as participant observation or narrative interviews) when it comes to the measurement of impact. However, they only make sense when based on a careful analysis of the context before measures are taken. Unless qualitative methods are used to understand the dynamics of causes and effects, it is normally impossible to pick sensible indicators for quantitative research. Accordingly, a qualitative analysis of the interventions’ context should again take place after a randomised experiment in order to understand the way the proven causal relations work.

It is frequently argued that randomised experiments are expensive and require intensive planning. Of course, only a limited number of projects will be chosen for applying this evaluation method. A solid assessment, however, cannot be handled in passing.

Another merit of randomised experiments is that they force implementing agencies to consider evaluation issues in depth before starting a project or a programme. Randomised experiments help to make measurement a core component of any effort. Assessment is thus no longer seen as a burdensome, isolated and compulsory exercise at the end of a project. Finally, it is comparatively difficult to manipulate the results of randomised experiments, which serves the transparency of evaluation.

It is noteworthy, however, that randomised experiments are pertinent only for certain types of interventions. This tool is not of any use for assessing the results of budget support, advice to governments and policy dialogue at the national level. It only serves to evaluate the results of reform implementation at the municipal or household levels, but not to assess a reform process and its origins as such.

Paris, Accra and the future

The Aid Effectiveness Agenda, which was formulated in Paris in 2005 and updated in Accra last year, calls for “management for results” and “mutual accountability”. Evaluation will thus become even more important in international development cooperation in future. This foreseeable trend has contributed to bringing about two international initiatives:
- The Network of Networks of Impact Evaluation (NONIE) was launched in late 2006. It basically consists of representatives of donor agencies and scientific evaluation institutions. The degree of its institutionalisation is limited, with the secretariat hosted by the World Bank’s Independent Evaluation Group. The idea is to identify and promote high methodological standards. However, a dispute is still going on within the network between the supporters of qualitative-participatory methods and those of predominantly quantitative approaches.
- In 2008, the International Initiative for Impact Evaluation (3IE) was founded, which is also still works on a network-basis. It originated from an initiative of the Center for Global Development in Washington. Members include some bilateral donors, large NGOs such as the Gates Foundation and a few aid-recipient countries. The still nascent 3IE emphasises counter-factual evaluation and intends to use a bidding process for funding impact studies. One explicit goal is to build capacities for evaluation in developing countries through training and consulting services.

Other interesting developments include a trend towards several actors carrying out joint assessments in the context of donor harmonisation and programme-based funding. Doing so should facilitate learning in developing countries as well as help to reduce transaction costs.

For good reason, however, the Paris Declaration on Aid Effectiveness highlights the ownership of the developing countries. That means that , to the extent feasible, they should play a growing role in implementing and guiding evaluations.

Therefore, it is crucial to strengthen the evaluation skills in developing countries. In the long run, that will allow them to initiate their own learning and innovation processes, and to back up donor initiatives with the necessary degree of empirically-based constructive criticism.

Multivariate regression analysis is a well-established tool for comparing countries in econometric terms. Economic and social indicators (such as economic growth, household income, infant mortality et cetera) are usually chosen as dependent variables. Such data exist for a large number of developing countries. On this basis, it is possible to check whether official development assistance (ODA) has made a significant difference and even control for potential other factors of influence.

It is true that, in many countries, national statistics still leave something to be desired. Nonetheless, cross-country comparisons make sense. This method makes it possible to assess a variety of different goals and causes. If guided by plausible analytical considerations, it can help significantly to prove causal relationships.

Econometric comparison is counter-factual. It allows experts to check the effects of interventions in different doses. Moreover, the methodolgy is quite transparent, because analyses are normally based on data that are available to the public. Anyone with the relevant methodological skills can review, complement and improve studies.

However, the complexity of econometric methods is a disadvantage. They require considerable experise. Therefore, it is sometimes difficult for specialists to communicate their insights to policymakers. (jf)