Wow, this week we can talk about about anythiiiiiiiiiiiiiing we want to (stats related of course), so I have been inspired by this week’s lecture and I’m going to chat about outliers. Are they important? Incase you are unsure as to what I mean by “outlier”, you can basically sum up an outlier as something that is very different from everything else. (Like this guy in the image below!!)
So in terms of statistics, an outlier is a data point that is extremely numerically different from all the other data points in the sample, it doesn’t follow any of the patterns that the other results may show and they can really change a researcher’s results due to the impact they have on the mean! When we’re doing psychological experiments, outliers can occur for many different reasons. It could be due to the researcher, did they not have a big enough sample? Were there flaws in their design? Did they measure incorrectly?? Although, the outlier data point could have been developed from the participant! Was the participant not listening to the instructions of the task? Did they fake their answers? ….. or was it complete chance?
One thing to consider with outliers, is.. is it wrong to remove them from your data? Some may say it’s wrong to remove an outlier from your data, because you are messing with the natural results from the study.
However, I think in some circumstances it is the correct thing to do!
Having outliers in a set of data can have a really dramatic influence on the outcome of the study. The value obtained for the correlation can be seriously affected.. you may end up thinking your research has been really successful (or really disastrous) just because of one single participant changing the mean!
I’ll show you an example experiment so I can try and show you what I’m rambling on about
My example experiment is testing the levels of hyperactivity in children against how many fizzy drinks they have consumed that day (I’ve used the exact same numbers from out of a book* to make sure I get the sums right haha).
The first image shows a set of data points where the correlation is almost 0 (r=-0.08) meaning there isn’t really a relationship between the two so the amount of fizzy drinks most likely doesn’t affect a child’s hyper behaviour. The outlier example has been added into the second dataset, and there is a huuuuuuge change in the correlation value! Due to this ONE participant, the correlation is now r=0.85, suggesting there is a strong positive correlation, and that fizzy drinks do affect how hyper a child behaves! So the entire outcome of the experiment has changed just due to this one person, is that fair?
In conclusion, I think it is acceptable to remove outliers from your dataset as they can have a serious effect on the end result, and for the majority of the time it is an unnecessary effect. In some cases when outliers occur it is important that the research looks into it, it could be due to they didn’t understand the task, therefore the researcher should probably reconsider their experiment! 🙂
*statistics for the behavioural sciences, eighth edition