Reliability & Validity: What to Look for When Exploring Pre-Hire Assessments

For decades, psychometrics have been used by recruiters in the workplace. Be it a Big Five personality or numerical reasoning assessment, the scientific approach of testing applicants has helped recruiters make decisions about ranking and prioritising candidates for years. With a new wave of gamified, abstract, and highly immersive experiences to choose from, it is becoming increasingly difficult to know which assessment platforms will provide the most success in your company.

Nearly all available platforms today will allow you to explore and highlight the significant differences between their solution and others on the market. The major element of this will be their unique scientific approach.

Each solution claims to provide high test reliability and external validity, discussing these at lengths, reassuring the hirer that their scientific approach holds a more superior value than those you may have utilised in the past.

And yet, more and more people leaders claim to have consistently negative experiences with assessments. Despite the integration of a new tool, recruiters continue to struggle to both hire and retain the right people. 

For example, you have identified that ‘resilience’ in the workplace is a trait that your top performers possess. A candidate completes a new assessment and scores highly for the trait ‘resilience’ and consequently is hired. Upon joining the business, they struggle to assimilate with the company or integrate with their team, leading to a premature exit. 

This example could also easily happen using a classic situational judgement tests (SJT). A nascent candidate applying for a role in a hospitality company is asked to complete an ‘off-the-shelf’ SJT for a front of house position. They score highly, but during their onboarding they struggle to perform in the role and exit the company in the first 60 days. Why does this happen?

Before we can begin to explore the idea that some forms of assessment are more reliable and valid than others, we should be clear about what reliability and validity actually relates to. 


Reliability is the degree to which a test produces similar scores each time it is used.

For example, if we weigh a 1kg weight on the same scale a number of times, and each time a different output is produced, we can conclude that this method cannot be considered ‘reliable’. 

The same could be true when assessing the reliability of a psychometric assessment. If we evaluate one participant on a specific value or behavioural trait a number of times using the same method, and each instance gives us a drastically different output, we could reasonably deduce that the assessment method had low reliability. 


Validity refers to how meaningful these results, however reliable, are in reality. 

Internal validity is the ability of the assessment to actually assess what it claims to. For example, if we wanted to assess the effects of sugar on a participant’s work rate, but instead used a sweetener in replacement of sugar, it would have very low internal validity. This is due to the fact that the test isn’t actually assessing what we need to find out. 

External validity is the ability of a test’s output to translate, or be generalised, into the ‘real world’ or a wider population.

This is what a number of assessment platforms offer. The assessment provider – whether a traditional scale, SJT or gamified approach – will have rigorously tested their method on a diverse, large population. This enables them to prove the external validity of the assessment and conclude that the output could be generalised into the real world.

Nevertheless, this does not mean it can definitely translate into your unique work environment. This is due to the low ecological validity of the assessment.

Ecological and external validity are sometimes confused. The reason for this is that both external and ecological validity represent an assessment’s ability to translate into the real world. However, a method of assessment will yield high ecological validity if its results can be more accurately translated into a specific environment or situation. 

An experiment that is high in ecological validity will reflect as much of the specific environment that is being measured as possible. The experiment designer will aim to reproduce as many of the ecological and environmental factors from the specific environment to get more reliable, and therefore more predictive, responses from participants. 

How this relates to pre-employment assessments

Traditional models of psychometrics have been proven to be reliable, developed from a pre existing index of characteristics, personalities, and trait values. The issue of proving validity in a specific business lies within the context of the application of these models. Even with high external validity of the platform, it does not necessarily mean that the output of an assessment will translate into your company and benefit your recruitment in a positive way.

For example, situational judgement tests (SJT) aim to assess the ways in which candidates will respond to certain encounters that they may face in the workplace or role. A selection of multiple choice questions will translate to a varying degree of behaviours, characteristics or values that the company will be searching for in their employees. An example of this method might be an SJT which assesses a candidate’s ability to work in customer services. An ‘off-the-shelf’ test will broadly assess a candidate’s ability to handle objection, recall information, verbal reasoning, and numerical ability amongst other job specific abilities.

There could however be a number of issues with this method. When it comes to assessing values, the assessment developer may have a completely different outlook on what a value means compared to your internal business understanding. Answering off the shelf questions ignores the contextual and cultural differences of any one company. In addition, the questions in the assessment do not take into account many ecological factors such as team culture, working styles, values, customer demographics or internal processes which drastically affect how a participant may react or behave in your company. Just because someone performs well in a customer service role in one company, does not mean they will in another. 

An extension of this approach is to develop an SJT in the unique context of a specific business. The assessment would be designed to translate the company’ culture, language, brand, and values, measuring the candidates capability to work in customer services from the point of view of what ‘great’ looks like in your specific company.

This method would yield much higher ecological validity, as the company has been able to translate the realities of life in their business, and therefore get more reliable responses from the candidates. The output produced is much more representative to how the participant may behave in the workplace, and therefore mean the data is much more likely to achieve your assessment goals. 


Hopefully this short article has been useful in making a little more sense of reliability and validity, enabling you to approach the exploration of pre-employment assessments with more confidence. When evaluating which platform will bring you the most value, it’s vital that you ask yourself what is most important to your needs. If you want your assessment results to translate into real world performance, then an off-the-shelf solution, although easy to implement, may not bring you the most value add.

Comments are closed.