Avoiding Common Errors with Pre-Post Survey Estimates
A common way to try to estimate a program’s impact is by surveying participants before and after a program. That approach can produce reasonably valid estimates in the right context, but I have often seen many organizations and people (including professional evaluators) misuse pre-post surveys.
The right context is when:
Both the pre-survey and the post-survey have a very high response rates, or non-responses occur at random (i.e. incorrect/old email addresses, people being sick, on vacation, busy, etc.).
It is reasonable to assume that the only reasonable explanation for the observed change in participants’ responses is your program.
An example of a situation that fits those criteria would be a workshop that teaches participants about something that they are unlikely to learn outside of the workshop. For example, learning about impact measurement and evaluation isn’t exactly common place. If I offered an evaluation workshop to post-secondary students, and I could confirm that they were not enrolled in other evaluation courses during the same time, I would have confidence in a pre-post estimate of my workshop’s impact on students’ evaluation knowledge.
Unfortunately, there are many situations where the criteria for accurate pre-post estimates are not met. Often, there are other reasonable causes for the changes observed. For example, an evaluation of having School Resources Officers in schools measured students’ anxiety and fear of being bullied at the start and end of grade nine. The authors’ theory was that prior to grade 9 students had not experienced having cops in their schools, so changes in anxiety of fear of being bullied would be due to now having School Resources Officers in their schools. However, I would argue that the real reason is that starting grade nine is stressful, there are stories and fears of being bullied or hazed, and those feelings naturally decrease as students settle into their new school environments. The miss-attribution of the changes as a result of police presence is something that oppressed students are still fighting against today.
Another issue for many programs is that a significant amount of participants do not complete (or drop-out) of the program. Often those that drop-out of the program are unique in some way (i.e. more instability in their lives, the program isn’t working well for them, harder to serve and stay-in-touch with, etc.). The differential non-response at post will results in the average scores being different from the average scores at pre. That difference is not because of a program; rather, it is because the mix of participants is now uniquely different than it was at pre.
For example, let’s say we had ten participants that rated their mental well-being at pre on a 1 to 5 scale (with 5 being best) as follows:
Their average pre-score is 3.4
Now, let’s assume that two participants with poorer mental-being did not complete the program, and the post results end up as:
The average post-score for those that responded is 3.75 – 0.35 more than at pre.
However, the increase in the average scores is mainly due to fewer participants with poorer mental well-being being included in the post data.
To address the drop-out, you need to match participants’ pre and post scores, and analyze the differences in scores just for the participants that completed both a pre and post survey. Our example would look like:
The average difference in the scores is -0.125.
This made up example would estimate that the program did not have an effect on the participants that completed both a pre and post survey. That would be a reasonable statement, but it is important to recognize that we cannot easily assume that finding applies all of the participants that the program serves. With our example, what if the missing participants actually made significant gains in their mental well-being? Then, the program may have actually helped improved some people’s mental well-being, but that would not be captured in the matched pre-post results.
Be very careful in definitively claiming that changes observed from pre-post surveys are due to your program if there are other reasonable explanations. Involve your participants, stakeholders, and critics in assessing the reasonableness of those alternative explanations.
If response rates are lower than 95% at pre or post, and there is a pattern in the missing data, then you need to 1) match participants’ pre and post data, and 2) gather as much information as you can on the likely reasons for the missing data and on the missing participants likely results. Gathering that information will help you make reasonable assumptions as to whether the matched pre-post results would under-estimate, over-estimate, or accurately estimate your program’s impact for the entire group.