The history of psychological assessment is almost as old as the development of psychology itself as a profession. Therefore psychological assessment can reasonably be considered to be among the oldest and most researched domains of psychological practice. The particular use of psychological assessment for recruitment and selection is itself one hundred years old, since major publications dealing with the practice emerged by the time of the First World War. Since that time, many of the more general zeitgeist and ideological influences on the profession have frequently found their expression also in the field of psychological assessment, which has variously been acclaimed by its most ardent supporters and excoriated by its most trenchant critics. It is not the intention of this contribution to repeat these well-worn arguments which are covered in most significant books on individual differences. Rather in this article some of the significant ongoing and newer issues relevant to the field of psychological assessment in recruitment and selection will be surveyed. Although the focus is on psychological testing as distinct from other assessment techniques such as inbaskets, ratings, field evaluations and so on, no specific attention will be given to particular measures.
Psychologists’ clients have been willing to pay literally millions of dollars for psychological assessment to assist in recruitment and selection decisions for the following line of reasoning.
|Rationale for using psychological assessment in recruitment|
The implementation of this general logic for the use of psychological assessment in recruitment and selection has, over the last century of theory, research and practice, proved to be increasingly more complex, multidimensional, nuanced, at times elusive and even on occasions illusory. However, it continues to be one of the most active areas of intellectual endeavour in the psychology profession as the following discussion of salient issues illustrates.
From the development of the earliest tests, psychologists were confronted with the dilemma of having constructed assessment tools which were quickly found to be practically useful but were difficult to define in terms of exactly what was being measured. Labels and definitions have abounded ever since and it is still not hard to find circular definitions such as the following (Hampson, 2011) in the literature:
… personality processes may be defined as actions or reactions over time that produce outcomes associated with personality constructs. (p.317)
Moreover as Deary (2012) notes:
Some key issues that Spearman and Binet addressed are still lively topics of research: along which dimensions of mental abilities do people differ? Do these differences matter? And what are the causes of such differences? (p.454)
However, there has been major taxonomic progress in the fields of intelligence and personality assessment primarily through research and meta-analysis. Of course there remains some differences of opinion, but there appears to be general agreement about the importance of the hierarchical model of intellectual ability – a general reasoning factor, group factors and specific domain factors. Research findings support the predominant importance of general reasoning in accounting for criterion variance, with domain factors accounting for comparatively little (Deary, 2012).
With respect to personality, the Five Factor Model – a framework of traits involving Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism – has gained wide acceptance as a result of many meta-analytic publications. A very impressive amount of research data has now accumulated relating to the model and its respective measures. It has also been shown to have transcended the traditional lexical and questionnaire methodical divide of past personality measurement endeavours. However, there have emerged variations of the model including the Big Three Model and the Six Factor Model. The latter, which includes the additional dimension of Honesty/humility, may become more important in the selection field with the ongoing interest in counterproductive employee behaviour.
The comparatively new assessment domain of emotional intelligence amply illustrates some of the ongoing difficulties of definition in the use of psychological assessment in selection. Simply because there is a label which attracts attention, does this really represent a domain of assessment distinct from those typically employed in psychological assessment in selection? One of the first and continuing problems is whether whatever is being measured is actually a personality trait or a reasoning ability. These variations in the different ways in which emotional intelligence is conceived are reflected in the differing ways emotional intelligence has therefore been assessed.
While there is abundant evidence (e.g., Strenze, 2007) for the predictive validity of general ability measures for education level (of the order of .56), occupation (of the order of .45) and income (of the order of .23), there remains a number of issues associated with both the criteria that are being predicted and the predictors themselves.
Traditionally job analysis has focused on looking at specific positions in terms of work activities, worker attributes and work context. In terms of work activities, differences in the predictive validity of differing personality traits according to job stage have been reported, suggesting that some attributes are relevant when learning new tasks (transitional stage) and are distinct from those when performing ongoing routine tasks (maintenance stage).
For worker attributes, increasing attention is being paid to the differing demands of ‘normal’ (everyday duties) and ‘abnormal’ (unusual duties) work situations. Certain occupations may occasionally require responses to critical incidents or particular pressures, such as deadlines or confrontations, so increasingly the research literature and professional practice is focused on identifying not only competence for most of the work tasks but also potential vulnerability to poor emotional responses to unexpected, exceptional and stressful work situations. This can have major implications for workers to be able to avoid developing conditions such as posttraumatic stress disorder and to save employers costly workers compensation claims.
For work context, in addition to identifying organisational culture and appropriate candidate personality traits that may be related to it, increasing attention is being paid to the influence of work context factors that may affect predictive validity. It is well known that situation variables moderate the relationship between assessed traits and behaviour. Researchers have started to investigate situational strength variables, such as constraints and consequences, in order to obtain a better understanding of the influences on the predictive validity of personality measures and work performance, organisational commitment and job satisfaction.
Effort is also being directed toward improving the predictors in the predictor-criterion match by assessing traits in ways other than traditional questionnaire approaches. Sometimes called ‘implicit’ measures these approaches seek to assess identified traits such as those of the Five Factor Model through either situational judgement tests or conditional reasoning tasks. The latter are based on presenting scenarios in which judgements and justifications are required to be made, and out of this various trait differences are inferred. Some evidence suggests that such measures are less obvious in their purpose, and therefore may be more resistant to faked responses.
Response distortion such as faking has frequently been cited as having an adverse impact on the predictive validity of personality questionnaires in the selection context. Such distortion has been found to also affect various selection techniques such as structured interviews and assessment centre exercises, but the research data indicate that it is most influential for self-report questionnaires. Although various response scales are often employed to detect or correct response distortion effects, it has generally been found that such procedures are not particularly effective. There are some data to suggest that this issue is more of a problem in selection based on high performance than selection based on excluding on the basis of low performance, since in the latter case honest answering candidates would not be excluded at the expense of those faking good.
Other strategies to reduce faking effects are being investigated such as warning applicants against faking, verifying responses against other data and forced-choice item formats. Another approach to the issue is to accept that what is being assessed by personality scales in which faking is known to occur is to treat the results as a form of presentation behaviour. In such a case regardless of whether faking has occurred or not, positive presentation behaviour relevant to the selection criteria can still be viewed as useful in indicating the candidates’ ability to respond appropriately to a work relevant situation.
In considering screening out candidates for selection purposes, recent attention has been given to predicting performance failure in contrast to predicting performance success. It has been known for a long time through regression studies that intelligence measures are better at predicting academic failure rather than academic success – the so-called ‘twisted pear’ of score variation. However, recently this approach has been applied to predicting managerial incompetence through personality assessment as part of a process of screening out candidates for executive positions.
Hogan and Kaiser (2005) identify so-called ‘dark side’ dimensions such as cautious, excitable and reserved as potentially having some apparent short-term strengths in managerial performance, but in the longer term reveal substantial and potentially costly weaknesses. Interestingly, the authors proceed to point out that although high scores on these dimensions are correlated with managerial incompetence, low scores on such dimensions do not predict managerial competence. Instead managerial success is associated with optimal levels of such dimensions – thus for example, someone high on ‘cautious’ is likely to be risk averse and too conservative as a manager, however, someone very low on the same scale might be reckless and injudicious, which would also be undesirable.
Directly related to the previous issue is that not all predictive relationships in the selection process are linear. Traditional regression equations assume linearity between predictors and criteria, but as the above issue demonstrates this does not always have to be the case, especially when the predictors are non-cognitive dimensions. For example, curvilinear relationships between predictors and performance are well known for test anxiety and examination performance. Too much or too little anxiety has a detrimental impact on performance, whereas sufficient anxiety to heighten awareness and energy levels promotes it. An increasing number of research studies have emerged investigating dimensions of the Big Five personality traits, indicating curvilinear relationships between Agreeableness and Openness with job performance, which were not previously observed using linear regression models.
In further efforts to improve the predictive validity of attribute measures for selection purposes, test researchers have investigated whether contextualising items improves such validity. Often personality measures developed for clinical purposes have been used in recruitment and selection. Such scales typically have non-contexualised items such as “I enjoy leading a team”. Recent research suggests that better predictive validity for selection purposes is obtained by using contextualised items such as “At work I enjoy leading a team”, presumably because the situational context may be a moderating variable. For example, someone might like leading a sporting team where only a game is at stake but not like the responsibility of leading a work team where there may be employment and financial pressures.
It is rarely the case that psychological assessment is the only type of data collection used in the selection process. Other sources of information such, as biodata, structured interviews, work sampling tasks, assessment centre evaluations, co-worker and referee evaluations and so on, are frequently used in varying combinations along with psychological assessment at varying times in the process. The research investigation of what is termed ‘incremental validity’ finds evidence to support the use of varying combinations of data collection techniques including psychological assessment, however, most of the individual research studies to date tend to confound methods of data collection with the content of the data collected. Further research and evaluation efforts are required to overcome this confounding factor.
In a crucial sense all person-work matching or fit models are misleading, since they assume a level of stability within the person, the work and the context in which the matching occurs which really does not exist. My own work with Jim Bright (Pryor & Bright, 2011) in applying chaos theory to career development along with the influences of complexity theory and systems theory, all indicate that while there is recognisable stability in the matching process, there is also dynamical (that is the right word when talking of systems) interaction as well. Chaos theory as applied to career development focuses attention on complexity, connection, change and chance, resulting in an interplay of stability and instability in people’s lives and careers. People change when they commence a new job, they may change the nature of the job once ensconced, they may change the dynamics of the work interactions and indeed they may even actually be employed to be agents of change in the work context. In the personnel selection field this has been called the study of 'dynamic performance' and as Sackett & Lievens (2008) observe:
Creatively solving problems, dealing with uncertain work situations, cross-cultural adaptability and interpersonal adaptability are dimensions that have been researched in recent years. (p.432)
Most of the research studies to date relating personality traits such as Openness and Emotional Stability as concomitants of change, have involved concurrent validity data. Some results indicate significant positive relationships between such traits and organisational change, cross-cultural training performance, handling uncertain work situations and creative behaviour. It has been observed that effective job performance in changing contexts involves a level of flexibility including the ability to 'unlearn' old strategies as well as relearning new tasks. It seems hard to believe that working will not involve more complexity, interconnection, unpredictability and change in the future. It is therefore likely that demands for the skills and attributes to deal with such challenges will constitute one of the major areas of thinking and research in the recruitment and selection field in the years to come.
Perhaps the most controversial issue in the field of psychological assessment for recruitment is that of the differential validity of ability measures across racial or minority groups. This issue has significant political, sociological and economic dimensions which have at times generated more heat than light. Various reviewers using meta-analysis have reached differing conclusions about the meaning of the available data. In the United States, which has much larger minority populations, this issue has ended up in courts which has caused organisations to seek from test developers more equitable ways to deal with the cognitive score differences that appear to favour Whites and Asians over Blacks and Hispanics. In the past, test constructors have responded with strategies that typically involve a 'validity-fairness trade-off' including seeking to eliminate items differentiating between racial groups, race-norming, coaching programs before testing, increasing time limits for test completion and improving test-taker motivation. In general these strategies have only been minimally successful at best.
New approaches being advocated now include conducting adverse impact analyses as part of the job analysis procedure, expanding the criteria to include citizenship and counterproductive behaviour, including predictors beyond the cognitive domain, test score banding such that all scores within a specified error range are considered to be equivalent, and using differential multiple regression weights for a set of predictors to control for minority group disadvantage. All of these proposals continue to have their supporters and their detractors. To some extent, responses to this seemingly intractable issue come down to questions of values which could be rather crudely formulated as: in the procedures of recruitment and selection, is it fair, reasonable and efficient to sacrifice statistical validity evidence for increased minority group equality of opportunity? This debate and attempted solutions to resolve it, will continue to be a source of attention for both those working in the field and those outside of it, ready to latch onto support for their own particular agenda.
The demand characteristics of testing tasks have been known for a long time to influence test scores, as, for example, in timed compared with untimed administration procedures. However, the utility of measures also relates to the ease of administration since this often constitutes a major issue in selection, especially when large numbers of applicants may be involved. Internet-based testing has been used widely, especially by organisations, and this is usually by computerising existing measures. However, as Sackett & Lievens (2008) observe there is rarely any justification given for why or why not any research was undertaken to establish the equivalence of the computerised version with the printed version. This is a significant issue since there are data from a limited number of research studies to suggest that for non-cognitive tests there can be significant differences in the score distributions and the levels of internal consistency when tests are computerised. This issue has been neglected by researchers to date presumably on the assumption of equivalence of demand characteristics of computerised and printed versions of psychological tests. The validity of such an assumption remains to be adequately evaluated over tests of both abilities and attributes used for recruitment and selection purposes. Moreover, to date the test presentation and feedback possibilities of internet-based assessment have received comparatively little serious attention.
Other issues associated with internet-based assessment that continue to be worked upon include test security and user identification. A neglected issue however, is whether having a test administrator present or absent affects results obtained (so-called 'unproctored testing'). These issues do require further investigation since administrative convenience, important though it is, cannot override issues of validity in the use of psychological assessment in recruitment and selection.
This review of psychological assessment issues in recruitment and selection cannot make any claim to be exhaustive. While readers can draw their own conclusions about the current state and future directions of the field, the following general conclusions are justifiable.
All major psychological practices and techniques have their critics and that is always a useful state of affairs, since it spurs those in the field to seek more evidence, improve conceptualisations and address the limitations of service delivery. Uncritical acceptance is the bane of any profession. Psychological assessment in general, and with reference to recruitment and selection in particular, continues to be one of the profession’s most controversial and contested domains, as it always has been. However, perhaps the greatest testimony to its contribution to both the profession and to the wider society, is not that it has endured despite the ire of its critics but that it has survived the often politically and ideologically motivated support of some of its most enthusiastic advocates.
Those wishing to investigate the issues outlined in this article further are encouraged to use as a starting point the journals Annual Review of Psychology, Journal of Applied Psychology and Personnel Psychology as sources of recent thinking and research.
The author wishes to acknowledge the interest and support of his colleagues Professor Jim Bright FAPS and Winston Horne MAPS, in the preparation of this article.