Globalisation and the increasing need for speed and efficiency in test administration and subsequent decision making have driven a marked increase in the use of online psychological testing within organisational settings in recent years. One of the major global test providers, SHL has indicated that now 95 per cent of its testing is conducted via online platforms rather than through the traditional paper and pencil method. The 2012 Global Assessment Trends Report (Fallaw, Kantrowitz & Dawson, 2012) discloses that at least 64 per cent of the human resources survey respondents indicate that their companies allow ‘remote’ testing. Such remote testing involves online testing in a location without direct supervision (‘unproctored’) from a test administrator, with the locations including the test taker’s home, a library or a coffee shop. There is widespread support (e.g., Hambleton, 2010) for the view that within five to ten years all psychological testing, apart from certain clinical and neuropsychological testing, will be conducted online. This article focuses on issues associated with the use of online testing for personnel selection within a recruitment context, but it also has relevance for other psychological testing applications.
While there may not be universal agreement on the benefits of online testing, it appears to be well accepted that it offers advantages in terms of cost, volume, efficiency, global reach and standardisation (see, for example, Tippins, 2009). Additional benefits should include security (of test materials and scoring protocols) and flexibility (e.g., item format, and accessibility to a given test offering different language versions and norms). Publishers can ensure test administrators have access to the most up-to-date test versions, norms and manuals. In essence, online testing can lead to better, faster and cheaper assessment outcomes, particularly where large volumes of test takers exist.
The effectiveness of online testing is underpinned not only by the advances in technology and test development, but also by the fundamental importance of general mental ability as a key predictor of job performance (Schmidt & Hunter, 1998). With the advancement of item response theory, the early use of computers as ‘page turners’ has been supplanted increasingly by two methods for online ability testing: linear-on-the-fly (LOFT), which involves a large data bank of items selected at random for a test of fixed item length; and computerised adaptive testing (CAT) which can result in a relatively short test, with item selection from a large data bank of items being dependent upon the test taker’s pattern of responses. The test is concluded once a prescribed threshold of the standard error of measurement is reached.
The issues and challenges of online psychological testing can be considered under the following themes.
As we move into alternative forms of testing and assessment (e.g., video-based testing and assessment at US Customs, as described by Cucina, Busciglio, Thomas, et al. (2011)), a fundamental question needs to be addressed: “What are we really measuring?” In converting a paper-based test to an online format, appropriate piloting and even simulation needs to be conducted, with consideration of differential item functioning.
An issue typically raised with unproctored internet testing (UIT) is that of cheating. ‘Speeded’ high stakes cognitive tests appear to be buffered partially from the cheating phenomenon, with ‘power’ tests likely to be more vulnerable. Surrogates may undertake the test, although test taker authentication can also be an issue for traditional testing.
The emerging field of data forensics addresses the need to prevent and detect inappropriate test taking behaviour. US-based organisations such as Kryteryon offer real time analysis of online responses so that unusual patterns can be detected (e.g., fast latencies on difficult items) and keystroke analytics can be used to authenticate test taker identity. Nevertheless, cheating can still occur.
So what percentage of test takers cheat on online tests? Jing, Drasgow & Gibby (2012) claim that the estimated base rate of cheating is low, although the extent of cheating is probably influenced by the perceived selection ratio and the average level of item difficulty. Similarly, Weiner & Rice (2012) contend that (only) 5-10 per cent of UIT scores were unconfirmed in subsequent verification testing. However, what level of confidence is required for a practitioner to conclude that an individual has cheated when a verification score is statistically different from the original UIT score?
One benefit of online testing is the standardisation of test materials, and the work flow processes associated with the testing. However, there can be variability within UIT, including not only the testing environment, but also the quality and suitability of the device display and the technology in general.
While technology enhancements reduce the impact of the latter issue, candidates are prepared to undertake testing under what appear to be non-optimal conditions. Morelli (2012) presented data from over 900,000 applicants for customer support roles, with a small percentage undertaking the testing via a game console. Furthermore, applications are now available for the completion of personality tests on mobile devices. The test delivery mechanism and the testing environment would appear to be potential sources of error in obtained test scores.
Online testing can be a very efficient vehicle for culling a large group of candidates to produce a small group for subsequent, more intensive assessment. It can demonstrate good utility. However, problems may occur through lack of contact with the test taker and a failure to appreciate the context of the testing and any special factors associated with the testing activity or the test taker. The potential danger with online testing is that the ‘number’ (or profile from a computer generated report) becomes the ‘person‘ in the eyes of some test users, particularly if they are unskilled in notions such as measurement error, confidence intervals, and the nature of the norms being used. Bear in mind, this concern also exists with traditional testing (see Aldhous, 2012 for a recent example involving the WAIS).
‘Touch’ provides qualitative information which can increase the variability in the assessment (‘noise’), but it can also provide rich information regarding the individual and how they function (‘signal’).
Much of what pertains to good online psychological testing practice mirrors what is regarded as good testing practice in general. The following guidelines represent good testing practice, particularly for online testing.
The International Test Commission has produced a range of documents relevant to testing in general, and online testing in particular. A key document is the International Guidelines on Computer-Based and Internet Delivered Testing (available from www.intestcom.org/guidelines). Future refinements to these documents will be made by adding case studies to assist practitioners in understanding what constitutes ‘good practice’. A range of quality online testing resources have also been developed by the British Psychological Society through the Psychological Testing Centre (www.psychtesting.org.uk).
The APS has also produced guidelines to assist practitioners, the Guidelines for Psychological Assessment and the Use of Psychological Tests (available from www.psychology.org.au/practitioner/essential/ethics/guidelines/). A very comprehensive document which was produced by the APS in 1997 is currently being updated by members of the recently formed APS Tests and Testing Reference Group, along with the development of other resources on psychological assessment.
At the recent International Test Commission conference in Amsterdam, the opening state-of-the-art speech was titled ‘The Evolution of Assessment: Simulations and Serious Games’ and fitted neatly with the conference theme ‘Modern Advances in Assessment: Testing and Digital Technology, Policies and Guidelines’. It appears that organisations are increasingly being attracted to assessment activities which incorporate computerised animation as it offers increased flexibility, customisation and lower band-width requirements (compared with standard video simulations). The assessment can be meshed with the organisation’s branding and target marketing initiatives in online recruitment and selection activities, including graduate recruitment.
The demand for test applications for mobile devices is increasing, and this is driven primarily by test takers rather than test publishers. Organisations are likely to respond to such demands, with a recent survey of human resources personnel indicating that only 23 per cent thought it would be unfair to allow candidates to complete assessments via smart phones (Fallaw et al., 2012). The future lexicon in selection testing (and already existing within learning and development) may well include terms such as gamification, avatars and multi-media simulations. Nevertheless, reliability and validity will not fade in importance.
Online testing has changed the nature of testing for psychologists, particularly those working in the organisational and educational fields. While the Australian Defence Force decided not to adopt UIT several years ago (Hinton, 2005), there is increasing pressure on organisational decision makers (and psychologists) to do so, given its utility. For example, it has been reported that the US Office of Personnel Management, which is responsible for delivering testing services to US federal agencies nationwide, has been instructed to introduce UIT.
As psychologists, we have a responsibility to our clients. We can enhance our skills and knowledge by drawing on the resources available to us from bodies such as the International Test Commission and the Psychological Testing Centre, as well as the training provided by professional publishers. The APS, through the Tests and Testing Reference Group, is in the process of developing web-based resources to assist psychologists in being more effective in testing and assessment activities, including online testing.
The author can be contacted at firstname.lastname@example.org